Re: Empty message header and body

2020-07-15 Thread ad...@extremeshok.com



Do you know if the IV or key value changed ?

Ie. Did you copy the key file from the old server and use the old servers 
config or the old servers IV value ?



> On 16 Jul 2020, at 00:43, Marcelo Machado  wrote:
> 



Re: AW: Indexation of Excel files newer than 2007

2019-05-08 Thread ad...@extremeshok.com
Piler should drop the usage of al those outdated libraries and use 
https://tika.apache.org/

> On 08 May 2019, at 10:52, Katterl Christian  wrote:
> 
> In at least my case, this does not seem to work.
>  
> BR, Christian
>  
>  
>  
> Von: Janos SUTO  
> Gesendet: Montag, 6. Mai 2019 11:33
> An: Piler User 
> Betreff: Re: Indexation of Excel files newer than 2007
>  
> Newer office files, eg. xlsx, etc should be handled internally by the parser, 
> provided that you have libzip package installed as well as the header files, 
> libzip-dev or similar.
> 
> Janos
> From: Katterl Christian 
> Sent: Mon May 06 10:19:07 GMT+02:00 2019
> To: Piler User 
> Subject: AW: Indexation of Excel files newer than 2007
> 
>  
> Hello again,
>  
> 
> for docx, there would be: https://github.com/ankushshah89/python-docx2txt
>  
> 
> Unfortunately, I am not a software-developer to make the adoptions by myself.
>  
> 
> BR Christian
>  
> 
>  
> 
> Von: Martin Nadvornik  
> Gesendet: Montag, 6. Mai 2019 09:46
> An: Piler User 
> Betreff: Re: Indexation of Excel files newer than 2007
>  
> 
> Hello Christian,
> 
> catdoc is not capable of processing new office formats. As far as I know 
> there is no intention for catdoc to implement this in a foreseeable future. 
> The same problem exists for xls2csv. You could theoretically try to call 
> unoconv (https://github.com/unoconv/unoconv) before catdoc, but it will 
> probably have a big performance impact since it launches libre office / open 
> office for the conversion. But if you try this I would be interested in your 
> results since being limited to index only old office formats is also 
> something we would like to overcome. Alternatively if you can find an open 
> source software which is capable of efficiently extracting plain text from 
> current office formats it should be easily implementable into piler 
> (basically a few lines in extract.c as far as I can tell). For excel there is 
> https://github.com/xevo/xls2csv and https://github.com/nagirrab/xls2csv which 
> claim to be cabable of proccessing xlsx files. But I haven't looked into them 
> yet.
> 
> Kind Regards
> Martin
> 
> Am 06.05.2019 um 06:45 schrieb Katterl Christian:
> Hello,
>  
> Indexation of Excel files newer than Excel 2007 fails in my installation.
> I am using catdoc 0.95 and it tells:
>  
> This file looks like ZIP archive or Office 2007 or later file.
> Not supported by catdoc
>  
> The Excel-File has been created using Excel 2010.
>  
> BR, Christian
> 
> 
> Christian Katterl
> Teamleader Technical IT 
> 
> 
> 
> Asamer Baustoffe AG
> Unterthalham Straße 2
> 4694 Ohlsdorf
> Austria
> tel  +43 50 799 - 2511
> mobile +43 664 811 54 99
> email c.katt...@asamer.at
> www.abag.at
> 
> This message is confidential. It may not be disclosed to, or used by, anyone 
> other than the addressee. If you receive this message by mistake, please 
> advise the sender.
> Firmenbuch: Landesgericht Wels, FN: 407726y, ATU 68646334
> 
>  


Re: aftermath for a migration

2017-11-10 Thread ad...@extremeshok.com
Ive done more than 50 multi tb migrations. 

Pretty much create a tar and use rsync with tweaked encryption settings to 
speed up the transfer.

Nowdays we just do a zfs sync of the pool (disk storage) to the new servers 
zfs, it took a little over 7 hours to transfer 20tbytes 


_ eXtremeSHOK.com _

> On 10 Nov 2017, at 15:57, Federico SECURITY LINE  
> wrote:
> 
> Why not make a tarball of all the orig dir, copy the .tgz file in the dest, 
> untar, tha rsync to sync the file that meanwhile changed?
> 
> Thanks,
> F.
> 
> 
> Il 10 Novembre 2017 11:36:12 CET, s...@acts.hu ha scritto:
>> 
>> 
>> 
>> Dear piler users,
>> 
>> I had the pleasure (so to speak) of participating in a
>> piler migration project from hostA to hostB. Both hosts
>> are on the same datacenter, the network bandwidth is unknown
>> to me, but we may assume it's Gbit.
>> 
>> There was 2+ TB data to migrate, millions of files to copy.
>> 
>> Sftp was chosen as the copying method, and I can tell you
>> that copying the /var/piler/store dirs and files was taking
>> several days (not 2-3, rather many more).
>> 
>> So my conclusion is that using sftp (or even rsync, I believe)
>> is painful to migrate a piler archive to another host because
>> of the lots of small files.
>> 
>> I've been thinking how to make such possible future migrations
>> both easier and faster. I think such a migration would be less
>> painful if the data in /var/piler/store/00/... dirs were not be
>> in lots of files.
>> 
>> One possible solution is to use sqlite3 files, read on.
>> 
>> You know that the top level dirs in /var/piler/store/00 hold
>> ~12 days of data. After that it doesn't change, piler start writing
>> new emails to the next directory, nowadays it's 5a0. So what if we
>> could move all files in 59f to 59f.sdb, all files in 59e dir to 59e.sdb,
>> etc?
>> 
>> Then after 4 years you may end up with 4 * 365 / 12 =~ 120 big sdb 
>> files,
>> and the latest top level dir with the lots of small files (though much 
>> fewer
>> files compared to files in the 120 top level dirs).
>> 
>> So the sdb files would be big, but copying 120 large files to another 
>> host is
>> way much easier than 15 million smaller files.
>> 
>> OK, now the question is how to move data to these sdb files? The plan is 
>> to
>> create a utility to iterate through the top level dirs mentioned before, 
>> and
>> write the file contents to sqlite3 db files. Finally remove the .m and 
>> .a* files
>> successfully copied.
>> 
>> The only performance penalty (after writing the sdb files) comes to my 
>> mind is
>> that pilerget must first open the sdb file, and if it's not present 
>> (either
>> someone is not interested in this [optional] consolidation or this is 
>> the last
>> top level dir which has not been consolidated yet), then get the file 
>> from the
>> filesystem.
>> 
>> 
>> Another possible solution is to put all email data blob to mysql. There 
>> would
>> be a nice table, eg. maildata with 2-3 columns and the last column as a 
>> huge blob.
>> In this case instead of having 2-3 TB (in the case I mentioned) data and 
>> several
>> million files, you would have a very large mysql table with varying 
>> sizes of blob
>> data in each rows.
>> I'm not sure if it's a good idea. In this case instead of using 
>> mysqldump it would
>> be much easier to stop mysqld and copy the raw db file to the other 
>> host.
>> 
>> 
>> Before moving to either path I'd like to hear your comments, ideas on 
>> the topic.
>> 
>> Note: using sqlite data files is intended to be optional, not forcing 
>> anyone to
>> make this step at all. However, I believe it would be great allowing you 
>> to
>> migrate piler easier than today.
>> 
>> 
>> Janos
>> 
>> PS: Perhaps I introduce some bias with the following info, some of you 
>> may already know:
>> mailarchiva uses 1024 (or so) zip files to hold the encrypted files.
>> 
> 
> -- 
> Inviato dal mio dispositivo Android con K-9 Mail. Perdonate la brevità.


Re: Restoring archive in new installation

2017-01-05 Thread ad...@extremeshok.com
Basically you can boot into rescue mode and repair the boot loader to fix your 
original system, i.e. get it booting.

Another alternative would be to mount the "drives" on a working os to recover 
the piler config and vector values, download the message store and try and make 
a database backup.

If you have the config + vector + messages without the database I have code 
which will rebuild your archive.

_ eXtremeSHOK.com _

> On 05 Jan 2017, at 5:35 AM, Rhys Andrews  wrote:
> 
> Hi everyone,
> Please excuse my ignorance as I don’t know my way around the Linux 
> environment too well.
> 
> I  configured MailPiler on a virtual machine using VirtualBox, following the 
> instructions here:
> http://www.mailpiler.org/wiki/current:vmware-ova
> 
> I created a second volume for additional space as per the suggestion. As I 
> understand it, the entire /var directory is stored on that second volume? 
> 
> It worked great for a few months, but after a power outage last week, the 
> virtual machine would no longer boot (it would get stuck at ‘welcome to 
> grub!’). As far as I know, only the first volume (the bootable one) is 
> corrupted, but the second volume (containing /var, the archive?) could still 
> be in tact.
> 
> I have rebuilt MailPiler with new VMDK files and it appears to be working OK, 
> just without any of the original emails in the archive.
> My question is, would I be able to restore the archive from the broken second 
> volume? And if so, can somebody point me in the right direction?
> 
> (Yes, I should have kept a backup - but it was technically a test build, and 
> now I am ready for production).
> 
> Thanks in advance!
> Rhys
> 
> 
>   Rhys Andrews | I.T Administrator | Wycliffe Christian School
> Ph: (02) 4753 6422 +911 | E-mail: randr...@wycliffe.nsw.edu.au
> Address: 133 Rickard Road | Warrimoo 2774 | NSW Australia
> 
> 
> 


Re: Archive size

2016-08-12 Thread ad...@extremeshok.com
The left behind attachments are deduplicated attachments which could be needed 
for other emails.

We have a commercial tool which will scan all the attachments and check if they 
are orphaned.

Also check your /var/log as they can grow insanely large if you have debugging 
enabled.

The log rotate will compress your logs weekly and then reduce the file size 
which will also aid in the growth.


_ eXtremeSHOK.com _

> On 12 Aug 2016, at 1:58 PM, Konstantin  wrote:
> 
> Have anyone used 'inplace_enable' option for a sphinx ?
> It seems it will help me to fix growing disk space during indexer process.
> 
>> On Fri, Aug 12, 2016 at 2:35 PM, Konstantin  wrote:
>> Hello Janos,
>> 
>> We have an archive with following version: piler 0.1.25-master-branch, build 
>> 869
>> A server has 2T hdd and 1.5T used currently.
>> Usually 420G is available but when indexer.main.sh running free disk space 
>> reduced to 90G. 
>> It is too low so i started looking for a reason what's taking up the space 
>> on my drive.
>> 
>> Here are the questions i have.
>> 1) I read a documentation for sphinx and understand why disk used so hard 
>> during merge from dailydelta1 to main1.
>> Currently we have indexer.main.sh scheduled at 2am each day.
>> I would like to run it on Saturday only. For a week it will be ~180k emails.
>> Can it cause any issues during merge ?
>> 
>> 2) We have pilerpurge sheduled. Old emails removed from search and i thought 
>> that they removed from disk as well.
>> I looked for an attachments example and found this file
>> /var/piler/store/00/526/00/00/4000526faa5a1602877400bf63cb.a3
>> 
>> In piler.metadata message with piler_id=4000526faa5a1602877400bf63cb 
>> has deleted=1
>> In piler.attachment i also see results for that piler_id:
>> *** 1. row ***
>>  id: 79578
>>  piler_id: 4000526faa5a1602877400bf63cb
>>  attachment_id: 3
>>  name: medshoppe
>>  type: application/octet-stream
>>  sig: 37cfd3b8d394d791780d2c478f98239ad0fd5b656d4e7d52c64035dea76e86de
>>  size: 3944
>>  ptr: 0
>>  deleted: 0
>> 
>> A wiki said what pilerpurge remove an old message.
>> I suppose that piler also remove attachments.
>> Is that correct? 
>> Can you please confirm that piler build 869 remove old messages from disk ?
>> 
>> Thank you.
>> --
>> This message was delivered using 100% recycled electrons.
> 
> 
> 
> -- 
> This message was delivered using 100% recycled electrons.


Re: no mails can be found after imap import

2016-06-30 Thread ad...@extremeshok.com

Please watch your language.

What do you want to achieve by managing the folders ?



_ eXtremeSHOK.com _

> On 30 Jun 2016, at 1:38 PM, Joe Rady  wrote:
> 
> Thanks,
> i´ve managed to get some results now. excellent!
> 
> Is there a way to manage Folders in the GUI?
> 
> Also how do i assign Subfolders properly?
> I see that the table folders contains the column ‚parent_id‘.
> tried to set this by issuing
> 
> update folder set parent_id = 34 where id = 10;
> 
> as a result the GUI hides the folder with id 10 (and it´s content for 
> searches).
> the folder 34 appears blank
> 
> (hope i didn´t fuck up things)
> Thanks so much,
> Joe
> 
> 
> 
> 
> 
> 
>> Am 29.06.2016 um 20:36 schrieb Janos SUTO :
>> 
>> Hello,
>> 
>> this is the problem:
>> 
>> SELECT id FROM main1,dailydelta1,delta1 WHERE   folder IN () AND
>>   ^^^
>> 
>> To fix it you have to login as administrator and assign some folders to the 
>> given user. He should relogin, and verify on the settings page that he has 
>> his folders assigned.
>> 
>> Janos
>> 
>> 
>>> On Wed, 29 Jun 2016, Joe Rady wrote:
>>> 
>>> Hi,
>>> a Newbie here:
>>> I downloaded the ovf, startet it up, made some settings, most importantly 
>>> the support for folders.
>>> (my users are VERY fond of their imap folders)
>>> went into the web-gui and set up some users plus domains etc.
>>> everything looked nice.
>>> then i executed
>>> pilerimport -i imap.fleuchaus.com -u xx -p XxXxXx -R -P 993
>>> inside the folder /home/piler/
>>> it was counting the mails, and threw some error messages that some mails 
>>> could not be imported.
>>> i did look at the web Gui (health monitor) and found the correct number of 
>>> received mails and also a few hundred of duplicate mails (which does make 
>>> sense)...
>>> also i see space projection and all that
>>> however, when searching for mails i get none.
>>> my search in the mails list brought some ideas, but no solution.
>>> it appears that there are mails occupying space but they cannot be found.
>>> mail.log has some infos that i cannot make a good clue of:
>>> a whole bunch of lines like:
>>>  "Jun 27 15:10:23 piler pilerimport[3815]: error: helper: execl"
>>> which with my very limited knowledge of C tells me that something could not 
>>> be executed?
>>> and a large number of
>>> "Jun 27 15:13:51 piler pilerimport[3207]: 
>>> 4000577126992ee13a74008e6e435bad: cannot open: 
>>> 4000577126992ee13a74008e6e435bad.m
>>> Jun 27 15:13:51 piler pilerimport[3207]: 
>>> 4000577126992ee13a74008e6e435bad: error storing message: 
>>> 4000577126992ee13a74008e6e435bad.m"
>>> later on in the day i tried importing from a different imap account
>>> and got a lot of
>>> "Jun 27 17:17:15 piler reindex[6042]: 4000577124f62c8a32e400d171e161bc: 
>>> mysql_stmt_execute error: *Duplicate entry '780' for key 'PRIMARY'* (errno: 
>>> 1062)"
>>> the indexer in the crontab seems to do its job, but the log reports the 
>>> same over and over:
>>> "Jun 29 11:05:15 piler piler-webui[1921]: sphinx query: 'SELECT id FROM 
>>> main1,dailydelta1,delta1 WHERE   folder IN () AND  MATCH('') ORDER BY 
>>> `sent` DESC LIMIT 0,20 OPTION max_matches=1000' in 0.00 s, 0 hits, 0 total 
>>> found
>>> So although the health monitor indicates existing mails, the web gui cannot 
>>> find them.
>>> my apologies if i´ve overlooked some clues on the mailing list or on the 
>>> website, i´ve tried to find some?
>>> Best regards
>>> Joe Rady
>>> Head of IT
>>> Fleuchaus & Gallo Partnerschaft mbB
>>> __
> 



Re: 3 Message with wrong 2036 date

2016-03-03 Thread ad...@extremeshok.com
Please list the steps

Thanks

Sent from my iPhone

> On 03 Mar 2016, at 7:11 PM, Tim Stumbo  wrote:
> 
> Edwin, I can send you the steps I took to fix my dates if you like. 
> 
> Thanks 
> 
>> On Thursday, March 3, 2016, Janos SUTO  wrote:
>> Hello Edwin,
>> 
>> can you show me the headers for these emails? In Tim's case it turned out
>> there were some bogus Date: headers, and we managed to fix it with a
>> manual sphinx update.
>> 
>> Janos
>> 
>>> On 2016-02-18 17:17, Nichols, Edwin (BeneFACT) wrote:
>>> Strangely, I have the same issue with 8 emails. The emails themselves
>>> show a date of 2010, but the Piler interface shows them as being from
>>> 2036.  There may be a bug?
>>> 
>>> EDWIN NICHOLS  |  _Director – IT & Operations_
>>> 
>>> 
>>> 
>>> ' 416-360-SRED (7733) ext.161
>>> 
>>> ' Vancouver: 604-628-9870
>>> 
>>> ' Calgary: 403-775-7565
>>> 
>>> ' Toll Free: 1-855-TAX-BACK (829-2225)
>>> 
>>> , 647-689-3127
>>> 
>>> *  edwin.nich...@benefact.ca
>>> 
>>> 8 WWW.BENEFACT.CA [1]
>>> 
>>> P PLEASE CONSIDER THE ENVIRONMENT BEFORE PRINTING THIS EMAIL
>>> 
>>> -
>>> 
>>> Privacy Disclaimer - Français à suivre
>>> This e-mail message (including attachments, if any) is intended for
>>> the use of the individual or entity to which it is addressed and may
>>> contain information that is privileged, proprietary, confidential. If
>>> you are not the intended recipient, you are notified that any
>>> dissemination, distribution, or copy of this communication is strictly
>>> prohibited. If you have received this communication in error, please
>>> notify the sender and erase this e-mail message immediately.
>>> 
>>> Déclaration de confidentialité
>>> Le présent courrier électronique (y compris les pièces qui y sont
>>> annexées, le cas échéant) s'adresse au destinataire indiqué et
>>> peut contenir des renseignements de caractère privé ou confidentiel.
>>> Si vous n'êtes pas le destinataire de ce document, nous vous
>>> signalons qu'il est strictement interdit de le diffuser, de le
>>> distribuer ou de le reproduire. Si ce message vous a été transmis
>>> par erreur, veuillez en informer l'expéditeur et le supprimer
>>> immédiatement.
>>> 
>>> FROM: Tim Stumbo [mailto:timstu...@gmail.com]
>>> SENT: Thursday, February 18, 2016 9:45 AM
>>> TO: piler-user@list.acts.hu
>>> SUBJECT: 3 Message with wrong 2036 date
>>> 
>>> I have 3 messages that always appear at the top of the search field
>>> due to them having an incorrect date. The year on the messages is
>>> 2036. I would like to either delete these 3 e-mails or fix the index
>>> for them with the corrected date.
>>> 
>>> Could someone help me with this issue?
>>> 
>>> Thanks
>>> 
>>> Links:
>>> --
>>> [1] http://www.benefact.ca/


Re: pilerimport Syntax for importing from Exchange with impersonation because readpst skips many messages

2015-07-21 Thread ad...@extremeshok.com
In exchange 2010, you use the following format for the username  

DomainName\Username\Alias 

to log into the shared mailbox


Sent from my iPhone

 On 21 Jul 2015, at 9:15 PM, Joern Quillmann, kuehlhaus AG 
 j.quillm...@kuehlhaus.com wrote:
 
 Hi Everyone,
 
 I'm running piler on Ubuntu 15.04 for a few days now and I'm trying to import 
 all of our current mailboxes to piler. 
 I exported all mailboxes from Exchange to pst files and tried to convert them 
 with  readpst -M -D -b -o /var/piler/tmp/userXYZ /var/piler/tmp/userXYZ.pst.
 Readpst skips a lot of messages that are definitely there when I open the pst 
 files in Outlook. The sent items folder for example never gets converted. 
 Readpst always skips all messages from this folder and some from other 
 folders too.
 
 So I want to try importing from IMAP with pilerimport. As I don't have the 
 user passwords I need to use impersonation with our Exchange 2010.
 
 I can setup a user to have the impersonation right with Exchange but how do I 
 tell pilerimport which mailbox I want to import when I log into the IMAP 
 Server with the impersonation-user?
 I'm currently at a loss for the right syntax.
 
 Regards
 Joern Quillmann
 



Re: How scaleable is mailpiler?

2014-01-30 Thread ad...@extremeshok.com
Your over thinking the whole thing with trying to scale a single instance.

Create multiple virtual machines on something like open stack, etc.

1 vm per a client /company, scale the storage and resources as needed.

Client leaves, kill the vm.

This is how we do ours, far less maintenance and issues than trying to have a 
single large instance.


Https://extremeshok.com




Sent from my iPhone 5

 On 30 Jan 2014, at 3:44 PM, Janos SUTO s...@acts.hu wrote:
 
 
 Hello Daryl,
 
 On 2014-01-30 13:37, Darryl Sutherland wrote:
 We are investigating to see if mailpiler is a suitable replacement
 for MailArchiva. A few weeks ago I made a feature request to extend
 piler's authentication to generic ldap, and this feature was released
 in 2 days. This great responsiveness on your part was very well
 received, and the generic ldap authentication is working quite well.
 
 Ahh, I remember, I'm glad to hear that
 
 However, because mail ingestion and client churn generate a lot of
 activity in ISP scale environments, a single server installation would
 incur heavy database and disk activity when provisioning and purging
 data for new and cancelled clients. What is the recommended
 architecture to scale piler up to 20 000 users and beyond?
 
 so far the biggest piler installation I'm aware of is for a company with
 ~4000 users, ~10 million emails, and roughly 1 TB stored data.
 
 
 How would you architect Mailpiler to provide highly scalable
 capacity, performance and availability? Bearing in mind the possible
 impact of ingestion and data purging on both the file system and the
 database. Do you have any examples or case studies on clients from
 large enterprises who have moved to mailpiler, and how did they meet
 the architectural requirements?
 
 the company mentioned above has provided a single virtual machine for piler
 to archive emails. (Actually they have another piler VM at another datacenter,
 just in case the primary site goes down, but these two VMs are independent).
 
 I just saw your email archiving product line on your site, and if you offer
 10+ years of retention for 20k users (and perhaps more), then we definitely
 need a distributed solution where the huge data is somehow separated, either
 based on functionality (sphinx, mysql, emails) or user ids, or even years.
 
 However I haven't been part of such a huge project yet, so I don't have much
 experience with the area (yet), although I have some (yet unproved) ideas.
 
 If you have some time and resources to experiment with piler, I would work
 together with you to come up with an enhanced version of piler that supports
 such a huge setup you have. However expect a somewhat longer ETA than 2 days
 to finish it :-)
 
 For starters what hardware infrastructure will you provide for a piler based
 email archiving? Would it be a virtualized environment with a shared storage
 like netapp? Or several computers with some local storage? How many users
 perform searches at the same time?
 
 Btw. just for being curious, what's the layout of the current mailarchiva
 based system for the load you described?
 
 
 Best regards,
 Janos