Re: pilerexport where_condition query length limitation

2021-05-03 Thread Ryan Blenis
Follow up for anyone needing this in the future:

It looks like you can safely change
~line 159 of pilerexport.c (in the run_query function) from:
char s[SMALLBUFSIZE];
to
char s[MAXBUFSIZE];
to get a longer SQL query with pilerexport (from 512 to 8192 characters)

But there should still probably be a more prominent error on the existing
mechanism when the query length exceeds the buffer length, as it currently
only fails in mail.log, and pilerexport itself appears to have just come
back with 0 matches.


On Mon, May 3, 2021 at 6:13 PM Ryan Blenis  wrote:

> Hi Janos,
>
> Running some queries with pilerexport -w and it seems I'm running up
> against a wall with longer queries (the newly logged "pilerexport" sphinx
> query in the mail.log cuts off the query at about ~510 characters) which
> causes the query to fail and the following SQL just starts spewing
> mysql_stmt errors because the SQL is "incomplete". Assuming that's actually
> 512 characters with a null termination and another I can't determine just
> yet, but looking to hopefully bump up this limit and thought it would also
> be a quick bug fix on your end (and make it so I don't have to dig through
> too much code to see what needs to be done to make that happen =D )
>
> Thank you!
>


pilerexport where_condition query length limitation

2021-05-03 Thread Ryan Blenis
Hi Janos,

Running some queries with pilerexport -w and it seems I'm running up
against a wall with longer queries (the newly logged "pilerexport" sphinx
query in the mail.log cuts off the query at about ~510 characters) which
causes the query to fail and the following SQL just starts spewing
mysql_stmt errors because the SQL is "incomplete". Assuming that's actually
512 characters with a null termination and another I can't determine just
yet, but looking to hopefully bump up this limit and thought it would also
be a quick bug fix on your end (and make it so I don't have to dig through
too much code to see what needs to be done to make that happen =D )

Thank you!


Re: Pilerexport attachment numbers plus advanced querying

2021-05-02 Thread Ryan Blenis
Perfect. Thank you so much again!

On Sun, May 2, 2021 at 3:04 PM  wrote:

>
>
> The main search query is equivalent in capabilities with the 'advanced'
> search.
> In fact I personally consider it the preferred way to search. See the
> examples
> at https://www.mailpiler.org/wiki/current:using-the-gui
>
> Also consider the below query how to use it:
>
> a:pdf raw: @subject test email >>> out of office date1:2015.11.18
> date2:2015.11.18
>
> Janos
>
>
> On 2021-05-02 20:44, Ryan Blenis wrote:
> > Ah I see, the "raw: " is in the main search query but not applicable
> > to advanced searches where you can combine the query with dates and /
> > or attachment preferences. Is there any way to utilize it in the
> > advanced search (I would imagine "Raw:" would replace To, From,
> > Subject, and Body- as those are all "MATCH" query participants. (I
> > don't use tags or notes currently, so I'm not sure if they are
> > contained in the MATCH phrase. Since dates and attachments aren't in
> > MATCH (but prior in the query) it would be great to utilize both of
> > them in tandem.
> >
> > On Sun, May 2, 2021 at 2:38 PM Ryan Blenis 
> > wrote:
> >
> >> Hi Janos,
> >>
> >> Thank you! The number of attachments seems to be working perfectly,
> >> the "raw" updates don't seem to work, but maybe I'm misunderstanding
> >> the implementation. There's no new "Raw" field in the advanced
> >> search, so I tried putting "raw: testing" in the Body search field,
> >> but the sphinx query showed "@body raw: testing" versus what was
> >> expected: "testing". Am I misunderstanding the implementation and/or
> >> where to utilize this? Thank you!
> >>
> >> On Sun, May 2, 2021 at 8:28 AM  wrote:
> >>
> >>> On 2021-05-02 09:15, Ryan Blenis wrote:
> >>>
> >>>> Ideally, in the dry run, I'd like to have it print "attachment:
> >>> [id]"
> >>>> much like the emails print "id: [id]" so I can get a count of
> >>> emails
> >>>> and associated attachment numbers, which is often required in
> >>> legal
> >>>> queries as a the "number of items" found for certain search
> >>> terms.
> >>>> (I'm on a slightly older version that only prints "id: [id]" for
> >>> each
> >>>> email found, though I believe you added an email found count at
> >>> the
> >>>> end of the dry run in a newer version, ideally in that case
> >>> there
> >>>> would be a count of emails, and a count of attachments as well).
> >>>
> >>> This commit gives you the total number of attachments at the
> >>> beginning
> >>> in case of dry run:
> >>>
> >>
> >
> https://bitbucket.org/jsuto/piler/commits/f2683962333a741c64920d959bc12e23ed797aa4
> >>>
> >>> Let me know if it meets your expectations, or you need the
> >>> attachments
> >>> for each
> >>> message instead.
> >>>
> >>> Janos
>
>


Re: Pilerexport attachment numbers plus advanced querying

2021-05-02 Thread Ryan Blenis
Ah I see, the "raw: " is in the main search query but not applicable to
advanced searches where you can combine the query with dates and / or
attachment preferences. Is there any way to utilize it in the advanced
search (I would imagine "Raw:" would replace To, From, Subject, and Body-
as those are all "MATCH" query participants. (I don't use tags or notes
currently, so I'm not sure if they are contained in the MATCH phrase. Since
dates and attachments aren't in MATCH (but prior in the query) it would be
great to utilize both of them in tandem.

On Sun, May 2, 2021 at 2:38 PM Ryan Blenis  wrote:

> Hi Janos,
>
> Thank you! The number of attachments seems to be working perfectly, the
> "raw" updates don't seem to work, but maybe I'm misunderstanding the
> implementation. There's no new "Raw" field in the advanced search, so I
> tried putting "raw: testing" in the Body search field, but the sphinx query
> showed "@body raw: testing" versus what was expected: "testing". Am I
> misunderstanding the implementation and/or where to utilize this? Thank you!
>
> On Sun, May 2, 2021 at 8:28 AM  wrote:
>
>>
>>
>> On 2021-05-02 09:15, Ryan Blenis wrote:
>>
>> > Ideally, in the dry run, I'd like to have it print "attachment: [id]"
>> > much like the emails print "id: [id]" so I can get a count of emails
>> > and associated attachment numbers, which is often required in legal
>> > queries as a the "number of items" found for certain search terms.
>> > (I'm on a slightly older version that only prints "id: [id]" for each
>> > email found, though I believe you added an email found count at the
>> > end of the dry run in a newer version, ideally in that case there
>> > would be a count of emails, and a count of attachments as well).
>>
>> This commit gives you the total number of attachments at the beginning
>> in case of dry run:
>>
>> https://bitbucket.org/jsuto/piler/commits/f2683962333a741c64920d959bc12e23ed797aa4
>>
>> Let me know if it meets your expectations, or you need the attachments
>> for each
>> message instead.
>>
>>
>> Janos
>>
>>


Re: Pilerexport attachment numbers plus advanced querying

2021-05-02 Thread Ryan Blenis
Hi Janos,

Thank you! The number of attachments seems to be working perfectly, the
"raw" updates don't seem to work, but maybe I'm misunderstanding the
implementation. There's no new "Raw" field in the advanced search, so I
tried putting "raw: testing" in the Body search field, but the sphinx query
showed "@body raw: testing" versus what was expected: "testing". Am I
misunderstanding the implementation and/or where to utilize this? Thank you!

On Sun, May 2, 2021 at 8:28 AM  wrote:

>
>
> On 2021-05-02 09:15, Ryan Blenis wrote:
>
> > Ideally, in the dry run, I'd like to have it print "attachment: [id]"
> > much like the emails print "id: [id]" so I can get a count of emails
> > and associated attachment numbers, which is often required in legal
> > queries as a the "number of items" found for certain search terms.
> > (I'm on a slightly older version that only prints "id: [id]" for each
> > email found, though I believe you added an email found count at the
> > end of the dry run in a newer version, ideally in that case there
> > would be a count of emails, and a count of attachments as well).
>
> This commit gives you the total number of attachments at the beginning
> in case of dry run:
>
> https://bitbucket.org/jsuto/piler/commits/f2683962333a741c64920d959bc12e23ed797aa4
>
> Let me know if it meets your expectations, or you need the attachments
> for each
> message instead.
>
>
> Janos
>
>


Re: Pilerexport attachment numbers plus advanced querying

2021-05-02 Thread Ryan Blenis
Hi Janos,

On Sun, May 2, 2021 at 2:24 AM  wrote:

>
>
> Hello Ryan,
>
> On 2021-05-02 01:15, Ryan Blenis wrote:
>
> > 1. Is there a way to have the pilerexport dry run output the number of
> > attachments linked to the query?
>
> Just to clarify. Let's say you have 10 matching emails with 15
> attachments
> total, and you want pilerexport to print 15?
>

Ideally, in the dry run, I'd like to have it print "attachment: [id]" much
like the emails print "id: [id]" so I can get a count of emails and
associated attachment numbers, which is often required in legal queries as
a the "number of items" found for certain search terms. (I'm on a
slightly older version that only prints "id: [id]" for each email found,
though I believe you added an email found count at the end of the dry run
in a newer version, ideally in that case there would be a count of emails,
and a count of attachments as well).
Thank you!


> > 2. Would it be possible in the web UI to have an "expert"
> > setting/field where you can manually enter the full sphinx MATCH()
> > string for more precise results (while still utilizing the other
> > advanced search fields)? The sphinx query language allows quite a bit
> > more flexibility and I'm finding myself using pilerexport's -w flag to
> > get a read on match numbers versus querying in the web UI because of
> > this limitation (even though you CAN overload the fields to an extent)
>
> It's possible with a little tweak, I'll add it soon, but it makes sense
> only for auditors. Otherwise it breaks security.
>

Yes, I completely agree. I'm usually in as an auditor so I almost forgot
about other roles, thank you!

>
> > 2a. Overloading the web UI search fields with a more complex query, or
> > even simply utilizing "quotes" in the field appears to break the
> > "sphinx" query viewing link. This appears to be just a simple need for
> > some escaping on the front-end.
>
> I've fixed the issue with this commit:
>
> https://bitbucket.org/jsuto/piler/commits/dacb7c85aa7654c70f97c9bd4dc80a52913d842a


Awesome! Thank you!

>
>
> Janos
>
>


Pilerexport attachment numbers plus advanced querying

2021-05-01 Thread Ryan Blenis
Hi again!

Loving the updates to pilerexport since our last chat. A couple questions
for upgrades based on requests I've had:

1. Is there a way to have the pilerexport dry run output the number of
attachments linked to the query?

2. Would it be possible in the web UI to have an "expert" setting/field
where you can manually enter the full sphinx MATCH() string for more
precise results (while still utilizing the other advanced search fields)?
The sphinx query language allows quite a bit more flexibility and I'm
finding myself using pilerexport's -w flag to get a read on match numbers
versus querying in the web UI because of this limitation (even though you
CAN overload the fields to an extent)
2a. Overloading the web UI search fields with a more complex query, or even
simply utilizing "quotes" in the field appears to break the "sphinx" query
viewing link. This appears to be just a simple need for some escaping on
the front-end.

Thank you as always!


Re: pilerexport w flag matching

2021-04-08 Thread Ryan Blenis
Hi Janos,

Thanks as always for the reply. No worries on the -w replacing the other
switches, a documentation update will clear that up pretty easily, and
thank you, I'd love a total in the dry run!

As for "the export utility assumes searchd is listening on 127.0.0.1:9306
That part didn't change from 1.3.8 to 1.3.11."

I looked back at all the commits and didn't see anything in that file that
could be causing it, but I'll investigate because now it's just annoying me
that it doesn't work. I saw that it was assuming searchd was on
127.0.0.1:9306, and it is on my system, however it still fails with that
error in the new build. I don't know if you have a test case for
pilerexport -w, but it should be repeatable. I don't see much changing in
the cfg.c either, but when I printf the cfg.mysqluser and cfg.mysqlpwd on
the old (1.3.8) version I get the correct info. If I printf cfg.mysqluser
and cfg.mysqlpwd on the new version I get "r" and "" respectively. A quick
strace of pilerexport shows it opening the correct conf file
(/usr/local/etc/piler/piler.conf) on both versions. And specifying the
config file with -c did not help either. Not sure if you can reproduce this
on your end or not.

The odd part is, cfg.mysqluser and cfg.mysqlpwd _IS_ utilized in the
"init_session_data(, );" line above the
"init_session_data(, );" and it passes the database "open" test
just fine there...

Thank you.


On Thu, Apr 8, 2021 at 12:13 AM  wrote:

>
>
> Hello Ryan,
>
> On 2021-04-08 01:36, Ryan Blenis wrote:
> >
> > Thanks, that led me to what is causing the issue / confusion.
> >
> > The -w switch is described as "Where condition to pass to sphinx, eg.
> > "match('@subject: piler')"
> >
> > Which led me to believe the MATCH string was all that was supposed to
> > be there/replaced, however a quick look at the code shows that if -w
> > is used, it REPLACES the ENTIRE where clause. This distinction means
>
> yes, perhaps the "eg." was not that prominent in the short --help
> output.
> I'll improve the docs on the website, it's lagging behind the actual
> features,
> and I'll add a clarification on it.
>
> > The simplest workaround to this for others would be to note that -w
> > allows you to build your own query and negates the use of other
> > parameters. The ideal fix I think would be to still utilize the other
> > parameters, but have -w content appended within the MATCH() portion of
> > the query.
>
> Such fix would only complicate things because you can define the whole
> query using -w including the time frame, recipients, etc. Again, I'll
> add a clarification to the docs.
>
> > Aside from that: I realize I'm behind on piler (1.3.8), and would like
> > to update to get the latest pilerexport with zip capabilities, yet I
> > see there is no upgrade information on
> > https://www.mailpiler.org/wiki/current:upgrade . What is the process
> > to the latest (1.3.11)?
>
> Well, simply compile the new stuff, and overwrite the binaries and the
> GUI files. The database schema hasn't changed from 1.3.8 to 1.3.11.
> However, don't rush with that. The zip export feature has a poor
> performance that needs a rework.
>
> > I'd also like to add a "--num-only" type flag to pilerexport to see
> > the number of matches before exporting (would probably imply dryrun).
>
> We have something similar. When specifying -d (or --dry-run) it prints
> the matching serial ids, eg.
>
> $ pilerexport -d -w "MATCH(' some query')"
> id:318
> id:375
> id:518
> id:656
> id:660
> id:688
> id:733
>
> I'll improve it to add "total:7" as the last line of the output, if
> that's OK.
>
> > When trying to just compile the latest, I get the error "error: cannot
> > connect to 127.0.0.1:9306 [1]" so I'm not sure if that's an issue
> > because not all the components are upgraded, or if I had a different
> > configure flag/path configured during the original install.
>
> The export utility assumes searchd is listening on 127.0.0.1:9306
> That part didn't change from 1.3.8 to 1.3.11.
>
> Janos
>
>


Re: pilerexport w flag matching

2021-04-07 Thread Ryan Blenis
I should clarify the "error: cannot connect to 127.0.0.1:9306" error
message.

This error does not occur at compile time, but only at runtime of the
latest pilerexport, and only when the -w switch is used.

On Wed, Apr 7, 2021 at 7:36 PM Ryan Blenis  wrote:

> Hi Janos,
>
> Thanks, that led me to what is causing the issue / confusion.
>
> The -w switch is described as "Where condition to pass to sphinx, eg.
> "match('@subject: piler')"
>
> Which led me to believe the MATCH string was all that was supposed to be
> there/replaced, however a quick look at the code shows that if -w is used,
> it REPLACES the ENTIRE where clause. This distinction means that the use of
> -w negates the use of the a, b, and r switches as those parameters no
> longer go through your query builder. I was getting more results because it
> wasn't limited to a timeframe or to a recipient due to those flags being
> "skipped".
>
> The simplest workaround to this for others would be to note that -w allows
> you to build your own query and negates the use of other parameters. The
> ideal fix I think would be to still utilize the other parameters, but have
> -w content appended within the MATCH() portion of the query.
>
> Aside from that: I realize I'm behind on piler (1.3.8), and would like to
> update to get the latest pilerexport with zip capabilities, yet I see there
> is no upgrade information on
> https://www.mailpiler.org/wiki/current:upgrade . What is the process to
> the latest (1.3.11)?
>
> I'd also like to add a "--num-only" type flag to pilerexport to see the
> number of matches before exporting (would probably imply dryrun). I didn't
> see a way to do something like that already. If that's something that could
> be added, great, if not, I'll try my hand at it and submit a patch.
>
> When trying to just compile the latest, I get the error "error: cannot
> connect to 127.0.0.1:9306" so I'm not sure if that's an issue because not
> all the components are upgraded, or if I had a different configure
> flag/path configured during the original install.
>
>
> On Wed, Apr 7, 2021 at 5:19 PM Ryan Blenis  wrote:
>
>> Disregard that last email. Coffee is good, not re-running ./configure
>> after installing deps is bad. Following up shortly with more pertinent
>> info. Thank you.
>>
>> On Wed, Apr 7, 2021 at 3:58 PM Ryan Blenis  wrote:
>>
>>> Hi Janos,
>>>
>>> Thanks for the response, in trying to do this (I cloned the repo,
>>> ./configure --localstatedir=/var --with-database=mariadb , and ran make)
>>> and got this:
>>>
>>> Making all in src
>>> make[1]: Entering directory '/tmp/piler/src/piler/src'
>>> gcc -std=c99 -O2 -fPIC -Wall -Wextra -Wimplicit-fallthrough=2
>>> -Wuninitialized -Wno-format-truncation -g  -I. -I..  -I/usr/include/mariadb
>>> -I/usr/include/mariadb/mysql -D_GNU_SOURCE -DHAVE_TRE -DNEED_MYSQL -o
>>> pilerexport pilerexport.c -lpiler -lz -lm -ldl -lcrypto -lssl -ltre
>>> -L/usr/lib/x86_64-linux-gnu/ -lmariadb -L.
>>> /usr/bin/ld: /tmp/ccU39C8h.o: in function `write_to_zip_file':
>>> /tmp/piler/src/piler/src/pilerexport.c:329: undefined reference to
>>> `zip_open'
>>> /usr/bin/ld: /tmp/piler/src/piler/src/pilerexport.c:335: undefined
>>> reference to `zip_source_file'
>>> /usr/bin/ld: /tmp/piler/src/piler/src/pilerexport.c:336: undefined
>>> reference to `zip_file_add'
>>> /usr/bin/ld: /tmp/piler/src/piler/src/pilerexport.c:342: undefined
>>> reference to `zip_close'
>>> /usr/bin/ld: /tmp/piler/src/piler/src/pilerexport.c:339: undefined
>>> reference to `zip_strerror'
>>> collect2: error: ld returned 1 exit status
>>> make[1]: *** [Makefile:63: pilerexport] Error 1
>>> make[1]: Leaving directory '/tmp/piler/src/piler/src'
>>> make: *** [Makefile:41: all-recursive] Error 1
>>>
>>> (Note that I originally got a zip.h not found error, which I ran apt
>>> install libzip-dev. Ubuntu 20.04.2 LTS
>>>
>>> I can't seem to get past this point to recompile.
>>>
>>>
>>> On Wed, Apr 7, 2021 at 2:50 PM  wrote:
>>>
>>>>
>>>> Hello Ryan,
>>>>
>>>> please apply this patch to pilerexport.c, and recompile it.
>>>>
>>>> https://bitbucket.org/jsuto/piler/commits/e6607b0bf1d44562bcf2a08e3bfed94181b7b95d
>>>>
>>>> It syslogs the sphinx query. Then try the following. Enter the search
>>>> query
>>>> on the gui, and record the sphinx query syslogged. Then re-run the
>>>> pilerexport

Re: pilerexport w flag matching

2021-04-07 Thread Ryan Blenis
Hi Janos,

Thanks, that led me to what is causing the issue / confusion.

The -w switch is described as "Where condition to pass to sphinx, eg.
"match('@subject: piler')"

Which led me to believe the MATCH string was all that was supposed to be
there/replaced, however a quick look at the code shows that if -w is used,
it REPLACES the ENTIRE where clause. This distinction means that the use of
-w negates the use of the a, b, and r switches as those parameters no
longer go through your query builder. I was getting more results because it
wasn't limited to a timeframe or to a recipient due to those flags being
"skipped".

The simplest workaround to this for others would be to note that -w allows
you to build your own query and negates the use of other parameters. The
ideal fix I think would be to still utilize the other parameters, but have
-w content appended within the MATCH() portion of the query.

Aside from that: I realize I'm behind on piler (1.3.8), and would like to
update to get the latest pilerexport with zip capabilities, yet I see there
is no upgrade information on https://www.mailpiler.org/wiki/current:upgrade
. What is the process to the latest (1.3.11)?

I'd also like to add a "--num-only" type flag to pilerexport to see the
number of matches before exporting (would probably imply dryrun). I didn't
see a way to do something like that already. If that's something that could
be added, great, if not, I'll try my hand at it and submit a patch.

When trying to just compile the latest, I get the error "error: cannot
connect to 127.0.0.1:9306" so I'm not sure if that's an issue because not
all the components are upgraded, or if I had a different configure
flag/path configured during the original install.


On Wed, Apr 7, 2021 at 5:19 PM Ryan Blenis  wrote:

> Disregard that last email. Coffee is good, not re-running ./configure
> after installing deps is bad. Following up shortly with more pertinent
> info. Thank you.
>
> On Wed, Apr 7, 2021 at 3:58 PM Ryan Blenis  wrote:
>
>> Hi Janos,
>>
>> Thanks for the response, in trying to do this (I cloned the repo,
>> ./configure --localstatedir=/var --with-database=mariadb , and ran make)
>> and got this:
>>
>> Making all in src
>> make[1]: Entering directory '/tmp/piler/src/piler/src'
>> gcc -std=c99 -O2 -fPIC -Wall -Wextra -Wimplicit-fallthrough=2
>> -Wuninitialized -Wno-format-truncation -g  -I. -I..  -I/usr/include/mariadb
>> -I/usr/include/mariadb/mysql -D_GNU_SOURCE -DHAVE_TRE -DNEED_MYSQL -o
>> pilerexport pilerexport.c -lpiler -lz -lm -ldl -lcrypto -lssl -ltre
>> -L/usr/lib/x86_64-linux-gnu/ -lmariadb -L.
>> /usr/bin/ld: /tmp/ccU39C8h.o: in function `write_to_zip_file':
>> /tmp/piler/src/piler/src/pilerexport.c:329: undefined reference to
>> `zip_open'
>> /usr/bin/ld: /tmp/piler/src/piler/src/pilerexport.c:335: undefined
>> reference to `zip_source_file'
>> /usr/bin/ld: /tmp/piler/src/piler/src/pilerexport.c:336: undefined
>> reference to `zip_file_add'
>> /usr/bin/ld: /tmp/piler/src/piler/src/pilerexport.c:342: undefined
>> reference to `zip_close'
>> /usr/bin/ld: /tmp/piler/src/piler/src/pilerexport.c:339: undefined
>> reference to `zip_strerror'
>> collect2: error: ld returned 1 exit status
>> make[1]: *** [Makefile:63: pilerexport] Error 1
>> make[1]: Leaving directory '/tmp/piler/src/piler/src'
>> make: *** [Makefile:41: all-recursive] Error 1
>>
>> (Note that I originally got a zip.h not found error, which I ran apt
>> install libzip-dev. Ubuntu 20.04.2 LTS
>>
>> I can't seem to get past this point to recompile.
>>
>>
>> On Wed, Apr 7, 2021 at 2:50 PM  wrote:
>>
>>>
>>> Hello Ryan,
>>>
>>> please apply this patch to pilerexport.c, and recompile it.
>>>
>>> https://bitbucket.org/jsuto/piler/commits/e6607b0bf1d44562bcf2a08e3bfed94181b7b95d
>>>
>>> It syslogs the sphinx query. Then try the following. Enter the search
>>> query
>>> on the gui, and record the sphinx query syslogged. Then re-run the
>>> pilerexport command, and record the new sphinx query, and compare it
>>> with the previous value.
>>>
>>> Verify that even the single-quotes and double quotes are the same in
>>> both queries.
>>>
>>> Janos SUTO
>>>
>>>
>>> On 2021-04-07 18:18, Ryan Blenis wrote:
>>> > Hi Janos,
>>> >
>>> > I have to export potentially a ton of emails and was looking to use
>>> > pilerexport versus multiple batches of GUI searches. I saw the -w flag
>>> > and thought "great, I can use this" but it doesn't seem to respond
>>> > ap

Re: pilerexport w flag matching

2021-04-07 Thread Ryan Blenis
Disregard that last email. Coffee is good, not re-running ./configure after
installing deps is bad. Following up shortly with more pertinent info.
Thank you.

On Wed, Apr 7, 2021 at 3:58 PM Ryan Blenis  wrote:

> Hi Janos,
>
> Thanks for the response, in trying to do this (I cloned the repo,
> ./configure --localstatedir=/var --with-database=mariadb , and ran make)
> and got this:
>
> Making all in src
> make[1]: Entering directory '/tmp/piler/src/piler/src'
> gcc -std=c99 -O2 -fPIC -Wall -Wextra -Wimplicit-fallthrough=2
> -Wuninitialized -Wno-format-truncation -g  -I. -I..  -I/usr/include/mariadb
> -I/usr/include/mariadb/mysql -D_GNU_SOURCE -DHAVE_TRE -DNEED_MYSQL -o
> pilerexport pilerexport.c -lpiler -lz -lm -ldl -lcrypto -lssl -ltre
> -L/usr/lib/x86_64-linux-gnu/ -lmariadb -L.
> /usr/bin/ld: /tmp/ccU39C8h.o: in function `write_to_zip_file':
> /tmp/piler/src/piler/src/pilerexport.c:329: undefined reference to
> `zip_open'
> /usr/bin/ld: /tmp/piler/src/piler/src/pilerexport.c:335: undefined
> reference to `zip_source_file'
> /usr/bin/ld: /tmp/piler/src/piler/src/pilerexport.c:336: undefined
> reference to `zip_file_add'
> /usr/bin/ld: /tmp/piler/src/piler/src/pilerexport.c:342: undefined
> reference to `zip_close'
> /usr/bin/ld: /tmp/piler/src/piler/src/pilerexport.c:339: undefined
> reference to `zip_strerror'
> collect2: error: ld returned 1 exit status
> make[1]: *** [Makefile:63: pilerexport] Error 1
> make[1]: Leaving directory '/tmp/piler/src/piler/src'
> make: *** [Makefile:41: all-recursive] Error 1
>
> (Note that I originally got a zip.h not found error, which I ran apt
> install libzip-dev. Ubuntu 20.04.2 LTS
>
> I can't seem to get past this point to recompile.
>
>
> On Wed, Apr 7, 2021 at 2:50 PM  wrote:
>
>>
>> Hello Ryan,
>>
>> please apply this patch to pilerexport.c, and recompile it.
>>
>> https://bitbucket.org/jsuto/piler/commits/e6607b0bf1d44562bcf2a08e3bfed94181b7b95d
>>
>> It syslogs the sphinx query. Then try the following. Enter the search
>> query
>> on the gui, and record the sphinx query syslogged. Then re-run the
>> pilerexport command, and record the new sphinx query, and compare it
>> with the previous value.
>>
>> Verify that even the single-quotes and double quotes are the same in
>> both queries.
>>
>> Janos SUTO
>>
>>
>> On 2021-04-07 18:18, Ryan Blenis wrote:
>> > Hi Janos,
>> >
>> > I have to export potentially a ton of emails and was looking to use
>> > pilerexport versus multiple batches of GUI searches. I saw the -w flag
>> > and thought "great, I can use this" but it doesn't seem to respond
>> > appropriately for my test case. I have 2 emails that match the
>> > following (generalized terms used vs actual), limiting with -m 3 for
>> > testing purposes (I should only get 2 back).
>> >
>> > pilerexport -a 2010.10.01 -b 2021.04.06 -r "j...@domain.com" -m 3 -w
>> > 'MATCH('"'"'searchterm NEAR/25 (MNF|(search term)|term|(test search
>> > term)|termin*)'"'"')'
>> >
>> > Now, that match is just the bash string escaped version of:
>> > MATCH('searchterm NEAR/25 (MNF|(search term)|term|(test search
>> > term)|termin*)')
>> > (That's just a fancy sphinx query for "searchterm" within 25 words of
>> > MNF OR "search term" OR "term" OR "test search term" or "termin*" for
>> > those unfamiliar with sphinx.)
>> >
>> > Which, when overloading the Advanced Search for the "body" field in
>> > the GUI with:
>> > searchterm NEAR/25 (MNF|(search term)|term|(test search term)|termin*)
>> >
>> > Seems to work just fine and as expected, however, in pilerexport with
>> > the aforementioned command I get tons of unrelated emails (not even
>> > scoped to the appropriate j...@domain.com recipient). Is using a MATCH
>> > term like this with -w possible, or am I looking to do too much here?
>> >
>> > Note that I saw you added the -o parameter in the source so I may be a
>> > version or 2 back (utility doesn't seem to have a -v or --version
>> > output), and my version doesn't appear to have that, so I don't really
>> > have any great diagnostic/output information to go off of other than
>> > the above description.
>> >
>> > Thank you in advance as always for any insight you can give!
>>
>


Re: pilerexport w flag matching

2021-04-07 Thread Ryan Blenis
Hi Janos,

Thanks for the response, in trying to do this (I cloned the repo,
./configure --localstatedir=/var --with-database=mariadb , and ran make)
and got this:

Making all in src
make[1]: Entering directory '/tmp/piler/src/piler/src'
gcc -std=c99 -O2 -fPIC -Wall -Wextra -Wimplicit-fallthrough=2
-Wuninitialized -Wno-format-truncation -g  -I. -I..  -I/usr/include/mariadb
-I/usr/include/mariadb/mysql -D_GNU_SOURCE -DHAVE_TRE -DNEED_MYSQL -o
pilerexport pilerexport.c -lpiler -lz -lm -ldl -lcrypto -lssl -ltre
-L/usr/lib/x86_64-linux-gnu/ -lmariadb -L.
/usr/bin/ld: /tmp/ccU39C8h.o: in function `write_to_zip_file':
/tmp/piler/src/piler/src/pilerexport.c:329: undefined reference to
`zip_open'
/usr/bin/ld: /tmp/piler/src/piler/src/pilerexport.c:335: undefined
reference to `zip_source_file'
/usr/bin/ld: /tmp/piler/src/piler/src/pilerexport.c:336: undefined
reference to `zip_file_add'
/usr/bin/ld: /tmp/piler/src/piler/src/pilerexport.c:342: undefined
reference to `zip_close'
/usr/bin/ld: /tmp/piler/src/piler/src/pilerexport.c:339: undefined
reference to `zip_strerror'
collect2: error: ld returned 1 exit status
make[1]: *** [Makefile:63: pilerexport] Error 1
make[1]: Leaving directory '/tmp/piler/src/piler/src'
make: *** [Makefile:41: all-recursive] Error 1

(Note that I originally got a zip.h not found error, which I ran apt
install libzip-dev. Ubuntu 20.04.2 LTS

I can't seem to get past this point to recompile.


On Wed, Apr 7, 2021 at 2:50 PM  wrote:

>
> Hello Ryan,
>
> please apply this patch to pilerexport.c, and recompile it.
>
> https://bitbucket.org/jsuto/piler/commits/e6607b0bf1d44562bcf2a08e3bfed94181b7b95d
>
> It syslogs the sphinx query. Then try the following. Enter the search
> query
> on the gui, and record the sphinx query syslogged. Then re-run the
> pilerexport command, and record the new sphinx query, and compare it
> with the previous value.
>
> Verify that even the single-quotes and double quotes are the same in
> both queries.
>
> Janos SUTO
>
>
> On 2021-04-07 18:18, Ryan Blenis wrote:
> > Hi Janos,
> >
> > I have to export potentially a ton of emails and was looking to use
> > pilerexport versus multiple batches of GUI searches. I saw the -w flag
> > and thought "great, I can use this" but it doesn't seem to respond
> > appropriately for my test case. I have 2 emails that match the
> > following (generalized terms used vs actual), limiting with -m 3 for
> > testing purposes (I should only get 2 back).
> >
> > pilerexport -a 2010.10.01 -b 2021.04.06 -r "j...@domain.com" -m 3 -w
> > 'MATCH('"'"'searchterm NEAR/25 (MNF|(search term)|term|(test search
> > term)|termin*)'"'"')'
> >
> > Now, that match is just the bash string escaped version of:
> > MATCH('searchterm NEAR/25 (MNF|(search term)|term|(test search
> > term)|termin*)')
> > (That's just a fancy sphinx query for "searchterm" within 25 words of
> > MNF OR "search term" OR "term" OR "test search term" or "termin*" for
> > those unfamiliar with sphinx.)
> >
> > Which, when overloading the Advanced Search for the "body" field in
> > the GUI with:
> > searchterm NEAR/25 (MNF|(search term)|term|(test search term)|termin*)
> >
> > Seems to work just fine and as expected, however, in pilerexport with
> > the aforementioned command I get tons of unrelated emails (not even
> > scoped to the appropriate j...@domain.com recipient). Is using a MATCH
> > term like this with -w possible, or am I looking to do too much here?
> >
> > Note that I saw you added the -o parameter in the source so I may be a
> > version or 2 back (utility doesn't seem to have a -v or --version
> > output), and my version doesn't appear to have that, so I don't really
> > have any great diagnostic/output information to go off of other than
> > the above description.
> >
> > Thank you in advance as always for any insight you can give!
>


pilerexport w flag matching

2021-04-07 Thread Ryan Blenis
Hi Janos,

I have to export potentially a ton of emails and was looking to use
pilerexport versus multiple batches of GUI searches. I saw the -w flag and
thought "great, I can use this" but it doesn't seem to respond
appropriately for my test case. I have 2 emails that match the following
(generalized terms used vs actual), limiting with -m 3 for testing purposes
(I should only get 2 back).

pilerexport -a 2010.10.01 -b 2021.04.06 -r "j...@domain.com" -m 3 -w
'MATCH('"'"'searchterm NEAR/25 (MNF|(search term)|term|(test search
term)|termin*)'"'"')'

Now, that match is just the bash string escaped version of:
MATCH('searchterm NEAR/25 (MNF|(search term)|term|(test search
term)|termin*)')
(That's just a fancy sphinx query for "searchterm" within 25 words of MNF
OR "search term" OR "term" OR "test search term" or "termin*" for those
unfamiliar with sphinx.)

Which, when overloading the Advanced Search for the "body" field in the GUI
with:
searchterm NEAR/25 (MNF|(search term)|term|(test search term)|termin*)

Seems to work just fine and as expected, however, in pilerexport with the
aforementioned command I get tons of unrelated emails (not even scoped to
the appropriate j...@domain.com recipient). Is using a MATCH term like this
with -w possible, or am I looking to do too much here?

Note that I saw you added the -o parameter in the source so I may be a
version or 2 back (utility doesn't seem to have a -v or --version output),
and my version doesn't appear to have that, so I don't really have any
great diagnostic/output information to go off of other than the above
description.

Thank you in advance as always for any insight you can give!


Reindex Question / Clarification

2020-09-21 Thread Ryan Blenis
Hi Janos,

Quick question on reindexing and its effect on the size of the sphinx files.

If I reindex something that is already in the index, will there be
duplicates in sphinx wasting space, or is this detected and removed?

Background: We weren't searching for anything old normally, so we cleared
the sphinx data and only indexed the last years worth of data. Now we have
requests for data from parts of 2018, 2017, etc. that we've reindexed as
needed to make searchable, but the more requests come in it seems it would
be better to reindex everything and just throw resources at it if needed
and cut out the "manual" process of manually indexing anything prior to a
year ago as the requests come in. If I reindex everything, will I be
wasting space and should start fresh, or will the duplicates be filtered
out at some point in the process (e.g. not wasting space)?

Thank you as always.


Re: Rcpt Table Question

2020-05-02 Thread Ryan Blenis
Hi Janos,

Thanks for the quick reply. Yes they're all different... but very oddly
formatted.

I can send you a copy of the output directly if you'd like, as I don't want
to expose a bunch of emails. But the gist is that many have periods,
dashes, or numbers preceding the actual email address, in some cases as odd
as "--@--.com", "@hotmail.com", "---@hotmail.com",
"-@..immature", "-@..khairy", "-@d.i.s.c.o", "..@www.hip",
"..redactedhotmailaddr...@hotmail.com..me", "._.pitch@n._", etc.

I did a pilerget on the actual email and it contains one TO address (a
correct address, the only odd thing I noticed is that the TO address is in
all caps, and from looking at the DB/sphinx I know in some places you were
using "X" as a placeholder for periods.

I just migrated from 1.1.0 (since 2015) and set up a new server on 1.3.8 -
so the issue would have been (at least with) 1.1.0.  My upgrade method was
a bit of a mix after reading all your documentation on upgrading and server
migrations (since I was doing both), as well as Bitbucket issues. I know
it's been a while, but I really wanted to take the time to understand Piler
before running an upgrade and potentially screw something up.

I looked for a changelog with a fix like this, but I didn't see a formal
one, and looking through all the commits since 2015 was going to take a
LONG time so I figured I'd ask here. I can't see if I still get new rcpt
rows for that id, as I tossed my backups of prior DB versions on the old
server already, but I can certainly pay attention on the new server.

select id, COUNT(*) from rcpt GROUP BY id HAVING COUNT(*) > 50 ORDER BY
COUNT(*) ASC; shows that there are definitely a couple other id's with this
issue, and likely many more, but there is a steep dropoff from 3 million to
30k to 200-500 range per ID. Maybe you (or other users) can use that query
to check the recipient count for each stored email for similar issues.

I just wanted to make sure that my assumption on the rcpt table was correct
(e.g. it should only have the "to" recipients for each email matching on
metadata.id), so that I can delete all those rows and either manually
insert the address found from pilerget or delete the matching metadata row
and just re-import the email to fix the issue. Thanks in advance.


On Sat, May 2, 2020 at 8:29 AM  wrote:

>
> Hello Ryan,
>
> this must be definitely a bug. What piler version do you use?
> Try selecting the first 1000 rows for id=37, and check if the
> recipients are actually all different.
>
> Do you still get new rcpt rows for id=37?
> I suspect that piler tries to keep processing the very same email.
>
> Janos
>
>
> On 2020-05-02 09:41, Ryan Blenis wrote:
> > Hello,
> >
> > Just a question about the table. I took a look at the data, and I've
> > got 23 million rows in there, however I kept seeing id 37 pop up. I
> > queried for "SELECT COUNT(*) FROM rcpt WHERE id = 37" and got a count
> > of 3,116,977.
> >
> > I was under the impression (perhaps mistakenly) that the table was
> > used to store recipients of the email linked on metadata.id [1] - but
> > certainly no email had 3 million recipients. Can you let me know what
> > I'm missing here? Thank you.
> >
>


Rcpt Table Question

2020-05-02 Thread Ryan Blenis
Hello,

Just a question about the table. I took a look at the data, and I've got 23
million rows in there, however I kept seeing id 37 pop up. I queried for
"SELECT COUNT(*) FROM rcpt WHERE id = 37" and got a count of 3,116,977.

I was under the impression (perhaps mistakenly) that the table was used to
store recipients of the email linked on metadata.id - but certainly no
email had 3 million recipients. Can you let me know what I'm missing here?
Thank you.


Re: S3 Object Storage

2020-04-06 Thread Ryan Blenis
Hi Janos,

My preferred provider is Linode, I've had a couple VM's with them for years
and they've always been top-notch. I won't mention the other provider (I'm
not one to bash) currently hosting my Piler instance that I see service
drops on almost every morning, but I will say that (failing to have S3
support) I'd be looking to migrate from them to RamNode as they appear to
have decent reviews and a "spinning disk" option with a good amount of
space (comparable to that of the current host). So if you/anyone has any
praise or warnings of RamNode hosting Piler, I'm all ears.

I have not yet had to dabble in the C code for anything, but am familiar
with C/C++ and a few other various languages, so I always like to have the
option. It's good to know the web UI is still open in the enterprise
version. As for trying enterprise at this time, I suppose I'll try the
RamNode route first with spinning disk block storage. If the provider isn't
as stable as I'd like I may have to reconsider, but for now it will be a
matter of cost of licensing Piler purely for the S3 capability versus
minimizing my hosting provider options to those with larger storage
options. If I require a more stable host and can't find any with
inexpensive storage, it just comes out to a cost calculation between block
storage cost in a provider versus Piler enterprise costs for S3
compatibility... unless of course you come out with a discounted S3 "add-on
license" for just that feature that makes the math a no-brainer.

I've been using Piler for so many years personally, I should probably just
start pushing clients toward it, but I've had some issues with it missing
mails due to service outages, failing to load the UI unless the service is
restarted (after running for several days), failing to clear the sph_index
queue on a semi-regular basis, and a few other quirks along the way that
made me pause before offering it as an actual service in a customer-facing
manner. I've automated most of that fixing itself with Zabbix monitoring,
but some issues remain. Honestly though, I've attributed most if not all of
that to the current VPS host. If the new version I install on a new host
sees all those issues go away, and I'm comfortable having potential clients
log into the interface and know something random isn't going to happen to
them, the easy call is to push clients toward it and get an enterprise
license anyway. So here's hoping that's the case! I'd love to get rid of
the proprietary options I've got some clients on and move everyone to Piler
if possible.

Thanks again Janos!

On Mon, Apr 6, 2020 at 3:11 AM  wrote:

>
> Hello Ryan,
>
> On 2020-04-06 00:45, Ryan Blenis wrote:
> >
> > Thank you as always for your quick and in-depth response. I certainly
> > understand the delicate balance between FOSS and enterprise paywalls
> > and feature-sets, and I thank you for even having an open source
> > option to begin with! I'm just planning a migration to a new server
> > (for when my current OS is EOL) and wanted to move the hosting to one
> > with better support that doesn't over-provision the hosts resources as
> > I've seen on the current host, but HDD space isn't anywhere near as
> > inexpensive as it is at the current company. I was just thinking I
> > could have the best of both worlds with S3 storage integrated. Many of
> > the FOSS/enterprise paywalls in other projects are based on LDAP
> > integration, multi-tenancy, pre-packages binaries, support levels,
> > etc. that many business need, whereas object storage vs block storage
> > felt like more of a separate paywall in terms of cloud providers, so I
> > just figured I'd ask if it was coming to the FOSS version. Also, I
>
> it's ok to ask, and I always appreciate the feedback from piler users.
> Btw. can you share the names of these 2 providers? I'm curious about
> their pricing. And perhaps the piler community may suggest some better
> alternatives both in terms of hw performance and storage pricing.
>
>
> > like to be able to modify the code if a customization can be made,
> > which I assume would not be the case with enterprise, correct?
>
> Not sure if you fix the C source code before compiling. If not, then
> it's probably not a big deal that you can't modify the enterprise
> edition
> binaries. However, the gui part is still the usual php based stuff,
> it's not obfuscated, so you can fix it if you really want to do that.
> It's your archive after all. Custom css is supported out of the box.
>
> > Also, if I try the enterprise on the new host, is there a path to
> > downgrade to the FOSS version? Or would starting back from scratch be
> > the only option?
>
> Unfortunately no. The enterprise edition handles attachments in a
> different
> way: it doesn't need the attachment table

Re: S3 Object Storage

2020-04-05 Thread Ryan Blenis
Hi Janos,

Thank you as always for your quick and in-depth response. I certainly
understand the delicate balance between FOSS and enterprise paywalls and
feature-sets, and I thank you for even having an open source option to
begin with! I'm just planning a migration to a new server (for when my
current OS is EOL) and wanted to move the hosting to one with better
support that doesn't over-provision the hosts resources as I've seen on the
current host, but HDD space isn't anywhere near as inexpensive as it is at
the current company. I was just thinking I could have the best of both
worlds with S3 storage integrated. Many of the FOSS/enterprise paywalls in
other projects are based on LDAP integration, multi-tenancy, pre-packages
binaries, support levels, etc. that many business need, whereas object
storage vs block storage felt like more of a separate paywall in terms of
cloud providers, so I just figured I'd ask if it was coming to the FOSS
version. Also, I like to be able to modify the code if a customization can
be made, which I assume would not be the case with enterprise, correct?

Also, if I try the enterprise on the new host, is there a path to downgrade
to the FOSS version? Or would starting back from scratch be the only
option?

Thank you as always again.

On Sun, Apr 5, 2020 at 7:16 AM  wrote:

>
> Hello Ryan,
>
> On 2020-04-05 10:55, Ryan Blenis wrote:
> >
> > I see the enterprise version you offer it says one of the features
> > piler has is S3 object storage, which looks like it was originally
> > requested on the mailing list in 2018:
> > https://www.mail-archive.com/piler-user@list.acts.hu/msg01335.html
> >
> > Does this mean that it is fully implemented now? And if so, will we
>
> yes, it's fully implemented by now.
>
>
> > ever see this in the open source version? Given the lower prices on S3
> > compatible storage versus block storage and the unchanging nature of
> > the files, this would be great to have as an option that can reduce
> > hosting costs from what is now being used for [comparatively]
> > expensive block storage.
>
>
> well, I've been thinking about it for a while, and still not decided,
> it's a difficult business decision, so let me share my current position
> on the matter.
>
> TL;DR: not in the near future.
>
> I'm really proud that the open source edition has gained a pretty nice
> traction, and I decided to somehow monetize its popularity.
>
> So I've forked piler and built an enterprise edition along with some
> commercial
> services. To make the enterprise edition successful, I must offer some
> features
> that are worth to pay for that is not available for free in the open
> source edition.
>
> I believe that the S3 storage support you asked for is such an
> attractive feature,
> so I'll keep it for the enterprise edition only, not sure how long.
>
> However, I'm well aware of that the S3 costs are significantly lower
> than block
> storage costs at cloud hosting companies. I encourage you to try the
> enterprise
> edition, and see if it works for you. If so, and you like it, then
> perhaps we
> could figure out something viable to make the transition from open
> source to
> enterprise. I'd like to emphasize that I don't want to force you in this
> direction,
> and it's totally ok to decline.
>
> Let me know what you (and other piler users as well) think about the
> matter.
>
> Janos
>


S3 Object Storage

2020-04-05 Thread Ryan Blenis
Hi Janos,

I see the enterprise version you offer it says one of the features piler
has is S3 object storage, which looks like it was originally requested on
the mailing list in 2018:
https://www.mail-archive.com/piler-user@list.acts.hu/msg01335.html

Does this mean that it is fully implemented now? And if so, will we ever
see this in the open source version? Given the lower prices on S3
compatible storage versus block storage and the unchanging nature of the
files, this would be great to have as an option that can reduce hosting
costs from what is now being used for [comparatively] expensive block
storage.

Thank you as always.


Re: Exporting Bulk Emails

2019-09-26 Thread Ryan Blenis
Apologies- I jumped the gun on this request. The pilerexport --help
indicates that -F and -R can be used for a specific domain, which is not
found on http://www.mailpiler.org/wiki/current:exporting-emails (only the
specific email address). Thank you.

On Thu, Sep 26, 2019 at 12:57 PM Ryan Blenis  wrote:

> Hello,
>
> What is the process/possibility of mass exporting emails (such as if all
> emails for a certain domain for years need to be handed off; e.g. 1 million
> emails) need to be offloaded and zipped up?
>
> Thank you.
>


Exporting Bulk Emails

2019-09-26 Thread Ryan Blenis
Hello,

What is the process/possibility of mass exporting emails (such as if all
emails for a certain domain for years need to be handed off; e.g. 1 million
emails) need to be offloaded and zipped up?

Thank you.


Re: Rentention-rules and pilerpurge

2018-11-13 Thread Ryan Blenis
> > Do you have any idea what I should
> > do  so that more messages are being purged? Especially the retention
> > period seems suspect I think...
> For already archived spam, you should identify them in the metadata
table, eg. check the subject
> field. Then set the retained column for such messages only to the past,
eg. a yesterday
> timestamp, and pilerpurge will take care of them.

Hello Janos,

Seeing your above reply to Marina's post regarding purging data brought me
back to an issue we were trying to clear up (we have a certain email "from"
address we'd like purged, as it is not required in the archive). However,
we lacked the foresight to set up retention rules prior to the email
account spewing out millions of emails.

I figured I'd just go in and manually set the retention date back in the
past as you suggest, however I found the following note on
http://www.mailpiler.org/wiki/current:administering-piler:

--
Note that the retention period is included in the per message verification
digest, so the retention period should not be changed after the message has
been archived.
--

I've run a test on the database by selecting one email and changing the
retention period to the past and running

pilerpurge -d

to see the effect, and it says it will remove it.

If this is true, what are the ramifications of this and the message
verification digest no longer aligning?

Is this a viable method to get rid of the cruft we've accumulated and would
like to purge?

Thank you.