Re: Solr cloud questions

2019-08-16 Thread Kojo
Ere,
thanks for the advice. I don´t have this specific use case, but I am doing
some operations that I think could be risky, due to the first time I am
using.

There is a page that groups by one specific attribute of documents
distributed accros shards. I am using Composite ID to allow grouping
correctly, but I don´t know the performance of this task. This page groups
and lists this attributes like "snippets". And it is allowed to page.

I am doing some graph queries too, using streaming.  As far as I observe,
this features are not causing the problem I described.

Thank you,
Koji






Em sex, 16 de ago de 2019 às 04:34, Ere Maijala 
escreveu:

> Does your web application, by any chance, allow deep paging or something
> like that which requires returning rows at the end of a large result
> set? Something like a query where you could have parameters like
> =10=100 ? That can easily cause OOM with Solr when using
> a sharded index. It would typically require a large number of rows to be
> returned and combined from all shards just to get the few rows to return
> in the correct order.
>
> For the above example with 8 shards, Solr would have to fetch 1 000 010
> rows from each shard. That's over 8 million rows! Even if it's just
> identifiers, that's a lot of memory required for an operation that seems
> so simple from the surface.
>
> If this is the case, you'll need to prevent the web application from
> issuing such queries. This may mean something like supporting paging
> only among the first 10 000 results. Typical requirement may also be to
> be able to see the last results of a query, but this can be accomplished
> by allowing sorting in both ascending and descending order.
>
> Regards,
> Ere
>
> Kojo kirjoitti 14.8.2019 klo 16.20:
> > Shawn,
> >
> > Only my web application access this solr. at a first look at http server
> > logs I didnt find something different.  Sometimes I have a very big
> crawler
> > access to my servers, this was my first bet.
> >
> > No scheduled crons running at this time too.
> >
> > I think that I will reconfigure my boxes with two solr nodes each instead
> > of four and increase heap to 16GB. This box only run Solr and has 64Gb.
> > Each Solr will use 16Gb and the box will still have 32Gb for the OS. What
> > do you think?
> >
> > This is a production server, so I will plan to migrate.
> >
> > Regards,
> > Koji
> >
> >
> > Em ter, 13 de ago de 2019 às 12:58, Shawn Heisey 
> > escreveu:
> >
> >> On 8/13/2019 9:28 AM, Kojo wrote:
> >>> Here are the last two gc logs:
> >>>
> >>>
> >>
> https://send.firefox.com/download/6cc902670aa6f7dd/#Ee568G9vUtyK5zr-nAJoMQ
> >>
> >> Thank you for that.
> >>
> >> Analyzing the 20MB gc log actually looks like a pretty healthy system.
> >> That log covers 58 hours of runtime, and everything looks very good to
> me.
> >>
> >> https://www.dropbox.com/s/yu1pyve1bu9maun/gc-analysis-kojo.png?dl=0
> >>
> >> But the small log shows a different story.  That log only covers a
> >> little more than four minutes.
> >>
> >> https://www.dropbox.com/s/vkxfoihh12brbnr/gc-analysis-kojo2.png?dl=0
> >>
> >> What happened at approximately 10:55:15 PM on the day that the smaller
> >> log was produced?  Whatever happened caused Solr's heap usage to
> >> skyrocket and require more than 6GB.
> >>
> >> Thanks,
> >> Shawn
> >>
> >
>
> --
> Ere Maijala
> Kansalliskirjasto / The National Library of Finland
>


Re: Solr cloud questions

2019-08-15 Thread Kojo
Erick,
I am using Python, so I think SolrJ is not an option. I wrote my libs to
connect to Solr and interpret Solr data.

I will try to load balance via Apache that is in front of Solr, before I
change my setup, I think it will be simpler. I was not aware about the
single point of failure on Solr Cloud when I set my infra.

Thank you so much for your help,
Koji





Em qui, 15 de ago de 2019 às 14:11, Erick Erickson 
escreveu:

> OK, if you’re sending HTTP requests to a single node, that’s
> something of an anti-pattern unless it’s a load balancer that
> sends request to random nodes in the cluster. Do note that
> even if you do send all http requests to one node, the top-level
> request will be forwarded to other nodes in the cluster.
>
> But if your single node dies, then indeed there’s no way for Solr
> to get the request in the other nodes.
>
> If you use SolrJ, in particular CloudSolrClient, it’s ZooKeeper-aware
> and will both avoid dead nodes _and_ distribute the top-level
> queries to all the Solr nodes. It’ll also be informed when a dead
> nodes comes back and put it back into the rotation.
>
> Best,
> Erick
>
> > On Aug 15, 2019, at 10:14 AM, Kojo  wrote:
> >
> > Erick,
> > I am starting to think that my setup has more than one problem.
> > As I said before, I am not balancing my load to Solr nodes, and I have
> > eight nodes. All of my web application requests go to one Solr node, the
> > only one that dies. If I distribute the load across the other nodes, is
> it
> > possible that these problems may end?
> >
> > Even if I downsize the Solr cloud setup to 2 boxes 2 nodes each with less
> > shards than the 16 shards that I have now, I would like to know your
> > oppinion about the question above.
> >
> > Thank you,
> > Koji
> >
> >
> >
> >
> > Em qua, 14 de ago de 2019 às 14:15, Erick Erickson <
> erickerick...@gmail.com>
> > escreveu:
> >
> >> Kojo:
> >>
> >> On the surface, this is a reasonable configuration. Note that you may
> >> still want to decrease the Java heap, but only if you have enough “head
> >> room” for memory spikes.
> >>
> >> How do you know if you have “head room”? Unfortunately the only good
> >> answer is “you have to test”. You can look at the GC logs to see what
> your
> >> maximum heap requirements are, then add “some extra”.
> >>
> >> Note that there’s a balance here. Let’s say you can run successfully
> with
> >> X heap, so you allocate X + 0.1X to the heap. You can wind up spending a
> >> large amount of time in garbage collection. I.e. GC kicks in and
> recovers
> >> _just_ enough memory to continue for a very short while, then goes into
> >> another GC cycle. You don’t hit OOMs, but your system is slow.
> >>
> >> OTOH, let’s say you need X and allocate 3X. Garbage will accumulate and
> >> full GCs are rarer, but when they occur they take longer.
> >>
> >> And the G1GC collector is the current preference
> >>
> >> As I said, testing is really the only way to determine what the magic
> >> number is.
> >>
> >> Best,
> >> Erick
> >>
> >>> On Aug 14, 2019, at 9:20 AM, Kojo  wrote:
> >>>
> >>> Shawn,
> >>>
> >>> Only my web application access this solr. at a first look at http
> server
> >>> logs I didnt find something different.  Sometimes I have a very big
> >> crawler
> >>> access to my servers, this was my first bet.
> >>>
> >>> No scheduled crons running at this time too.
> >>>
> >>> I think that I will reconfigure my boxes with two solr nodes each
> instead
> >>> of four and increase heap to 16GB. This box only run Solr and has 64Gb.
> >>> Each Solr will use 16Gb and the box will still have 32Gb for the OS.
> What
> >>> do you think?
> >>>
> >>> This is a production server, so I will plan to migrate.
> >>>
> >>> Regards,
> >>> Koji
> >>>
> >>>
> >>> Em ter, 13 de ago de 2019 às 12:58, Shawn Heisey 
> >>> escreveu:
> >>>
> >>>> On 8/13/2019 9:28 AM, Kojo wrote:
> >>>>> Here are the last two gc logs:
> >>>>>
> >>>>>
> >>>>
> >>
> https://send.firefox.com/download/6cc902670aa6f7dd/#Ee568G9vUtyK5zr-nAJoMQ
> >>>>
> >>>> Thank you for that.
> >>>>
> >>>> Analyzing the 20MB gc log actually looks like a pretty healthy system.
> >>>> That log covers 58 hours of runtime, and everything looks very good to
> >> me.
> >>>>
> >>>> https://www.dropbox.com/s/yu1pyve1bu9maun/gc-analysis-kojo.png?dl=0
> >>>>
> >>>> But the small log shows a different story.  That log only covers a
> >>>> little more than four minutes.
> >>>>
> >>>> https://www.dropbox.com/s/vkxfoihh12brbnr/gc-analysis-kojo2.png?dl=0
> >>>>
> >>>> What happened at approximately 10:55:15 PM on the day that the smaller
> >>>> log was produced?  Whatever happened caused Solr's heap usage to
> >>>> skyrocket and require more than 6GB.
> >>>>
> >>>> Thanks,
> >>>> Shawn
> >>>>
> >>
> >>
>
>


Re: Solr cloud questions

2019-08-15 Thread Kojo
Erick,
I am starting to think that my setup has more than one problem.
As I said before, I am not balancing my load to Solr nodes, and I have
eight nodes. All of my web application requests go to one Solr node, the
only one that dies. If I distribute the load across the other nodes, is it
possible that these problems may end?

Even if I downsize the Solr cloud setup to 2 boxes 2 nodes each with less
shards than the 16 shards that I have now, I would like to know your
oppinion about the question above.

Thank you,
Koji




Em qua, 14 de ago de 2019 às 14:15, Erick Erickson 
escreveu:

> Kojo:
>
> On the surface, this is a reasonable configuration. Note that you may
> still want to decrease the Java heap, but only if you have enough “head
> room” for memory spikes.
>
> How do you know if you have “head room”? Unfortunately the only good
> answer is “you have to test”. You can look at the GC logs to see what your
> maximum heap requirements are, then add “some extra”.
>
> Note that there’s a balance here. Let’s say you can run successfully with
> X heap, so you allocate X + 0.1X to the heap. You can wind up spending a
> large amount of time in garbage collection. I.e. GC kicks in and recovers
> _just_ enough memory to continue for a very short while, then goes into
> another GC cycle. You don’t hit OOMs, but your system is slow.
>
> OTOH, let’s say you need X and allocate 3X. Garbage will accumulate and
> full GCs are rarer, but when they occur they take longer.
>
> And the G1GC collector is the current preference
>
> As I said, testing is really the only way to determine what the magic
> number is.
>
> Best,
> Erick
>
> > On Aug 14, 2019, at 9:20 AM, Kojo  wrote:
> >
> > Shawn,
> >
> > Only my web application access this solr. at a first look at http server
> > logs I didnt find something different.  Sometimes I have a very big
> crawler
> > access to my servers, this was my first bet.
> >
> > No scheduled crons running at this time too.
> >
> > I think that I will reconfigure my boxes with two solr nodes each instead
> > of four and increase heap to 16GB. This box only run Solr and has 64Gb.
> > Each Solr will use 16Gb and the box will still have 32Gb for the OS. What
> > do you think?
> >
> > This is a production server, so I will plan to migrate.
> >
> > Regards,
> > Koji
> >
> >
> > Em ter, 13 de ago de 2019 às 12:58, Shawn Heisey 
> > escreveu:
> >
> >> On 8/13/2019 9:28 AM, Kojo wrote:
> >>> Here are the last two gc logs:
> >>>
> >>>
> >>
> https://send.firefox.com/download/6cc902670aa6f7dd/#Ee568G9vUtyK5zr-nAJoMQ
> >>
> >> Thank you for that.
> >>
> >> Analyzing the 20MB gc log actually looks like a pretty healthy system.
> >> That log covers 58 hours of runtime, and everything looks very good to
> me.
> >>
> >> https://www.dropbox.com/s/yu1pyve1bu9maun/gc-analysis-kojo.png?dl=0
> >>
> >> But the small log shows a different story.  That log only covers a
> >> little more than four minutes.
> >>
> >> https://www.dropbox.com/s/vkxfoihh12brbnr/gc-analysis-kojo2.png?dl=0
> >>
> >> What happened at approximately 10:55:15 PM on the day that the smaller
> >> log was produced?  Whatever happened caused Solr's heap usage to
> >> skyrocket and require more than 6GB.
> >>
> >> Thanks,
> >> Shawn
> >>
>
>


Re: Solr cloud questions

2019-08-14 Thread Kojo
Shawn,

Only my web application access this solr. at a first look at http server
logs I didnt find something different.  Sometimes I have a very big crawler
access to my servers, this was my first bet.

No scheduled crons running at this time too.

I think that I will reconfigure my boxes with two solr nodes each instead
of four and increase heap to 16GB. This box only run Solr and has 64Gb.
Each Solr will use 16Gb and the box will still have 32Gb for the OS. What
do you think?

This is a production server, so I will plan to migrate.

Regards,
Koji


Em ter, 13 de ago de 2019 às 12:58, Shawn Heisey 
escreveu:

> On 8/13/2019 9:28 AM, Kojo wrote:
> > Here are the last two gc logs:
> >
> >
> https://send.firefox.com/download/6cc902670aa6f7dd/#Ee568G9vUtyK5zr-nAJoMQ
>
> Thank you for that.
>
> Analyzing the 20MB gc log actually looks like a pretty healthy system.
> That log covers 58 hours of runtime, and everything looks very good to me.
>
> https://www.dropbox.com/s/yu1pyve1bu9maun/gc-analysis-kojo.png?dl=0
>
> But the small log shows a different story.  That log only covers a
> little more than four minutes.
>
> https://www.dropbox.com/s/vkxfoihh12brbnr/gc-analysis-kojo2.png?dl=0
>
> What happened at approximately 10:55:15 PM on the day that the smaller
> log was produced?  Whatever happened caused Solr's heap usage to
> skyrocket and require more than 6GB.
>
> Thanks,
> Shawn
>


Re: Solr cloud questions

2019-08-13 Thread Kojo
Shawn,
Here are the last two gc logs:

https://send.firefox.com/download/6cc902670aa6f7dd/#Ee568G9vUtyK5zr-nAJoMQ


Thank you,
Koji


Em ter, 13 de ago de 2019 às 09:33, Shawn Heisey 
escreveu:

> On 8/13/2019 6:19 AM, Kojo wrote:
> > --
> > tail -f  node1/logs/solr_oom_killer-8983-2019-08-11_22_57_56.log
> > Running OOM killer script for process 38788 for Solr on port 8983
> > Killed process 38788
> > --
>
> Based on what I can see, a 6GB heap is not big enough for the setup
> you've got.  There are two ways to deal with an OOME problem.  1)
> Increase the resource that was depleted.  2) Change the configuration so
> the program needs less of that resource.
>
>
> https://cwiki.apache.org/confluence/display/solr/SolrPerformanceProblems#SolrPerformanceProblems-JavaHeap
>
> > tail -50   node1/logs/archived/solr_gc.log.4.current
>
> To be useful, we will need the entire GC log, not a 50 line subset.  In
> the subset, I can see that there was a full GC that did absolutely
> nothing -- no memory was freed.  This is evidence that your heap is too
> small.  You will need to use a file shariog site and provide a URL for
> the entire GC log - email attachments rarely make it to the list.  The
> bigger the log is, the better idea we can get about what heap size you
> need.
>
> Thanks,
> Shawn
>


Re: Solr cloud questions

2019-08-13 Thread Kojo
92K, used 4718591K
[0x0006a000, 0x0007c000, 0x0007c000)
 Metaspace   used 50496K, capacity 51788K, committed 53140K, reserved
1097728K
  class spaceused 5001K, capacity 5263K, committed 5524K, reserved
1048576K
}
{Heap before GC invocations=34293 (full 254):
 par new generation   total 1310720K, used 1310719K [0x00064000,
0x0006a000, 0x0006a000)
  eden space 1048576K,  99% used [0x00064000, 0x00067fe8,
0x00068000)
  from space 262144K,  99% used [0x00069000, 0x00069f98,
0x0006a000)
  to   space 262144K,   0% used [0x00068000, 0x00068000,
0x00069000)
 concurrent mark-sweep generation total 4718592K, used 4718591K
[0x0006a000, 0x0007c000, 0x0007c000)
 Metaspace   used 50496K, capacity 51788K, committed 53140K, reserved
1097728K
  class spaceused 5001K, capacity 5263K, committed 5524K, reserved
1048576K
2019-08-11T22:57:50.408-0300: 802528.065: [Full GC (Allocation Failure)
2019-08-11T22:57:50.408-0300: 802528.065: [CMS:
4718591K->4718591K(4718592K), 5.5953203 secs] 6029311K->6029311K(6029312K),
[Metaspace: 50496K->50496K(1097728K)], 5.5954659 secs] [Times: user=5.60
sys=0.00, real=5.60 secs]
--

Em seg, 12 de ago de 2019 às 13:26, Shawn Heisey 
escreveu:

> On 8/12/2019 5:47 AM, Kojo wrote:
> > I am using Solr cloud on this configuration:
> >
> > 2 boxes (one Solr in each box)
> > 4 instances per box
>
> Why are you running multiple instances on one server?  For most setups,
> this has too much overhead.  A single instance can handle many indexes.
> The only good reason I can think of to run multiple instances is when
> the amount of heap memory needed exceeds 31GB.  And even then, four
> instances seems excessive.  If you only have 30 documents, there
> should be no reason for a super large heap.
>
> > At this moment I have an active collections with about 300.000 docs. The
> > other collections are not being queried. The acctive collection is
> > configured:
> > - shards: 16
> > - replication factor: 2
> >
> > These two Solrs (Solr1 and Solr2) use Zookeper (one box, one instance. No
> > zookeeper cluster)
> >
> > My application point to Solr1, and everything works fine, until suddenly
> on
> > instance of this Solr1 dies. This istance is on port 8983, the "main"
> > instance. I thought it could be related to memory usage, but we increase
> > RAM and JVM memory but it still dies.
> > The Solr1, the one wich dies,is the destination where I point my web
> > application.
>
> You will have to check the logs.  If Solr is not running on Windows,
> then any OutOfMemoryError exception, which can be caused by things other
> than a memory shortage, will result in Solr terminating itself.  On
> Windows, that functionality does not yet exist, so it would have to be
> Java or the OS that kills it.
>
> > Here I have two questions that I hope you can help me:
> >
> > 1. Which log can I look for debug this issue?
>
> Assuming you're NOT on Windows, check to see if there is a logfile named
> solr_oom_killer-8983.log in the logs directory where solr.log lives.  If
> there is, then that means the oom killer script was executed, and that
> happens when there is an OutOfMemoryError thrown.  The solr.log file
> MIGHT contain the OOME exception which will tell you what system
> resource was depleted.  If it was not heap memory that was depleted,
> then increasing memory probably won't help.
>
> If you share the gc log that Solr writes, we can analyze this to see if
> it was heap memory that was depleted.
>
> > 2. After this instance dies, the Solr cloud does not answer to my web
> > application. Is this correct? I thougth that the replicas should answer
> if
> > one shard, instance or one box goes down.
>
> If a Solr instance dies, you can't make connections directly to it.
> Connections would need to go to another instance.  You need a load
> balancer to handle that automatically, or a cloud-aware client.  The
> only cloud-aware client that I am sure about is the one for Java -- it
> is named SolrJ, created by the Solr project and distributed with Solr.
> I think that a third party MIGHT have written a cloud-aware client for
> Python, but I am not sure about this.
>
> If you set up a load balancer, you will need to handle redundancy for that.
>
> Side note:  A fully redundant zookeeper install needs three servers.  Do
> not put a load balancer in front of zookeeper.  The ZK protocol handles
> redundancy itself and a load balancer will break that.
>
> Thanks.
> Shawn
>


Solr cloud questions

2019-08-12 Thread Kojo
Hi,
I am using Solr cloud on this configuration:

2 boxes (one Solr in each box)
4 instances per box

At this moment I have an active collections with about 300.000 docs. The
other collections are not being queried. The acctive collection is
configured:
- shards: 16
- replication factor: 2

These two Solrs (Solr1 and Solr2) use Zookeper (one box, one instance. No
zookeeper cluster)

My application point to Solr1, and everything works fine, until suddenly on
instance of this Solr1 dies. This istance is on port 8983, the "main"
instance. I thought it could be related to memory usage, but we increase
RAM and JVM memory but it still dies.
The Solr1, the one wich dies,is the destination where I point my web
application.

Here I have two questions that I hope you can help me:

1. Which log can I look for debug this issue?
2. After this instance dies, the Solr cloud does not answer to my web
application. Is this correct? I thougth that the replicas should answer if
one shard, instance or one box goes down.

Regards,
Koji


synonyms.txt -> minimum match (~ mm)

2019-06-19 Thread Kojo
I have a synonyms.txt mapping some words.

On my Solr 4.9, when I search a word that is in the synonyms.txt, the
debugger shows bellow:

"rawquerystring": "interleucina-6",
"querystring": "interleucina-6",
"parsedquery": "(+DisjunctionMaxQuerytext:interleucin
text:interleucin text:6 text:6)~4/no_coord",
"parsedquery_toString": "+(((text:interleucin text:interleucin
text:6 text:6)~4))",

On my Solr 6.6, the same search doesn´t include the ~4

"rawquerystring":"interleucina-6",
"querystring":"interleucina-6",
"parsedquery":"(+DisjunctionMaxQuery(((text:interleucin
text:interleucin text:6 text:6/no_coord",
"parsedquery_toString":"+((text:interleucin text:interleucin
text:6 text:6))",


My problem is that I don´t know how to configure Solr 6.6 to work as Solr
4.9 to apply the minimum match to the words from the synonyms.txt

On solrconfig.xml I tried to include the parameter 100% on the 
But when I do this, it applies to all the wods that I query, not only those
on the synonyms.txt.

All of the above was tested on Solr dashboard, there is any application
layer transforming the queries.

Could someone point what I am doing wrong?

Thank you very much.

Koji


Zookeeper solr config files

2019-05-06 Thread Kojo
This is a zookeeper question, but I wonder you can help me.

Is it possible to directly versioning Solr cloud config files on Zookeper
using Git or any other versioning system? Or I realy need to use Zookeeper
cli?

When I said versioning directly on Zookeper, I mean version a folder of
Zookeeper on the filesystem using Git.

Thank you,
Koji


Re: gatherNodes question. Is this a bug?

2019-04-11 Thread Kojo
on="process_number=node"),
process_number as node),fl="type_status_facet , amount",
on="node=process_number",)

{
  "result-set": {
"docs": [
  {
"node": "01/02444-7",
"type_status_facet ": "Ongoing projetcs",
"amount": 154620
  },
  {
"node": "01/08149-7",
"type_status_facet ": "Ongoing projetcs",
"amount": 131115
  },
  {
"node": "01/21749-3",
"type_status_facet ": "Ongoing projetcs",
"amount": 157300
  },
  {
"node": "01/22503-8",
"type_status_facet ": "Ongoing projetcs",
"amount": 154800
  },
  {
"EOF": true,
"RESPONSE_TIME": 24
  }
]
  }
}

4. Trying to gather more nodes. Here is the problem, the resultset comes
from the inner gatherNodes.

gatherNodes( my_collection, gatherNodes( my_collection,
fetch(my_collection, select(
complement(
search(my_collection, qt="/export", q=*:*, fl="process_number",
sort="process_number asc",  fq=ancestor:*, fq=situacao:("On going"),
fq=area_pt:("Ciências Exatas e da Terra"),
fq=auxilio_pesquisa_pt:("07;Auxílio Pesquisador|00;Auxilio Pesquisador -
Brasil")),
sort(
gatherNodes( my_collection, gatherNodes( my_collection,
search(my_collection, qt="/export", q=*:*,
fl="process_number", sort="process_number asc",  fq=-ancestor:*,
fq=situacao:("On going"), fq=area_pt:("Ciências Exatas e da Terra"),
fq=auxilio_pesquisa_pt:("07;Auxílio Pesquisador|00;Auxilio Pesquisador -
Brasil")),
walk="process_number->ancestor", trackTraversal="true",
gather="process_number"), walk="node->ancestor", trackTraversal="true",
gather="process_number", scatter="branches, leaves"),
by="node asc"),
on="process_number=node"),
process_number as node),fl="type_status_facet , amount",
on="node=process_number"),
walk="node->ancestor", trackTraversal="true", gather="process_number"),
walk="node->ancestor", trackTraversal="true", gather="process_number",
scatter="branches, leaves")

{
  "result-set": {
"docs": [
  {
"node": "01/20577-4",
"collection": "my_collection",
"field": "process_number",
"ancestors": [],
"level": 0
  },
  {
"node": "01/19764-4",
"collection": "my_collection",
"field": "process_number",
"ancestors": [],
"level": 0
  },
  {
"node": "01/09299-2",
"collection": "my_collection",
"field": "process_number",
"ancestors": [],
"level": 0
  },
  {
"node": "01/21532-4",
"collection": "my_collection",
"field": "process_number",
"ancestors": [],
"level": 0
  },
  {
"node": "01/11664-0",
"collection": "my_collection",
"field": "process_number",
"ancestors": [],
"level": 0
  },
  {
"EOF": true,
"RESPONSE_TIME": 30
  }
]
  }
}




5. Just one more information, maybe not necessary.
 Enclose step 3 wiht rollup. Check "level" field on the result-set, it is
NULL


rollup(having(sort(
fetch(my_collection, select(
complement(
search(my_collection, qt="/export", q=*:*, fl="process_number",
sort="process_number asc",  fq=ancestor:*, fq=situacao:("On going"),
fq=area_pt:("Ciências Exatas e da Terra"),
fq=auxilio_pesquisa_pt:("07;Auxílio Pesquisador|00;Auxilio Pesquisador -
Brasil")),
sort(
gatherNodes( my_collection, gatherNodes( my_collection,
search(my_collection, qt="/export", q=*:*,
fl="process_number", sort="process_number asc",  fq=-ancestor:*,
fq=situacao:("On going"), fq=area_pt:("Ciências Exatas e da Terra"),
fq=auxilio_pesquisa_pt:("07;Auxílio Pesquisador|00;Auxilio Pesquisador -
Brasil")),
walk="process_number->ancestor", trackTraversal="true",
gather="process_number"), walk="node->ancestor", trackTraversal="true",
gather="process_number", scatter="branches, leaves"),
by="node asc"),
on="process_number=node"),
process_number as node),fl="type_status_facet , amount",
on="node=process_number"),
by="level asc, type_status_facet  asc"),gt(if(eq(amount,null), 0,
amount),0)), over="level, type_status_facet ", sum(amount), count(*))

{
  "result-set": {
"docs": [
  {
"sum(amount)": 597835,
"count(*)": 4,
"type_status_facet ": "Ongoing projetcs",
"level": "NULL"
  },
  {
"EOF": true,
"RESPONSE_TIME": 26
  }
]
  }
}

Em qua, 10 de abr de 2019 às 16:49, Joel Bernstein 
escreveu:

> What you're trying to do should work. Possibly of you provide more detail
> like the full query with some sample outputs I might be able to see what
> the issue is.
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
>
> On Wed, Apr 10, 2019 at 10:55 AM Kojo  wrote:
>
> > Hello everybody I have a question about Streaming Expression/Graph
> > Traversal.
> >
> > The following pseudocode works fine:
> >
> > complement( search(),
> > sort(
> > gatherNodes( collection, search())
> > ),
> > )
> >
> >
> > However, when I feed the SE resultset above to another gatherNodes
> > function, I have a result different from what I expected. It returns the
> > root nodes (branches) of the inner gatherNodes:
> >
> > gatherNodes( collection,
> > complement( search(),
> > sort(
> >gatherNodes( collection, search())
> > ),
> > ),
> > )
> >
> > In the case I tested, the outer gatherNodes does not have leaves. I was
> > waiting to have the result from the "complement" function as the root
> nodes
> > of the outter gatherNodes function. Do you know how can I achieve this?
> >
> > Thank you,
> >
>


gatherNodes question. Is this a bug?

2019-04-10 Thread Kojo
Hello everybody I have a question about Streaming Expression/Graph
Traversal.

The following pseudocode works fine:

complement( search(),
sort(
gatherNodes( collection, search())
),
)


However, when I feed the SE resultset above to another gatherNodes
function, I have a result different from what I expected. It returns the
root nodes (branches) of the inner gatherNodes:

gatherNodes( collection,
complement( search(),
sort(
   gatherNodes( collection, search())
),
),
)

In the case I tested, the outer gatherNodes does not have leaves. I was
waiting to have the result from the "complement" function as the root nodes
of the outter gatherNodes function. Do you know how can I achieve this?

Thank you,


Re: Dealing with null values in streaming rollup

2018-10-22 Thread Kojo
I think that you can use stream evaluators in your expressions to filter
the values you want:

https://lucene.apache.org/solr/guide/6_6/stream-evaluators.html





Em seg, 22 de out de 2018 às 12:10, RAUNAK AGRAWAL 
escreveu:

> Thanks a lot Jan. Will try with 7.5
>
> I am currently using 7.2.1 version. Is there a way to fix it?
>
> On Fri, Oct 19, 2018 at 12:31 AM Jan Høydahl 
> wrote:
>
> > Have you tried with Solr 7.5? I think it may have been fixed in that
> > version? At least for the timeseries() expression...
> >
> > --
> > Jan Høydahl, search solution architect
> > Cominvent AS - www.cominvent.com
> >
> > > 18. okt. 2018 kl. 05:35 skrev RAUNAK AGRAWAL  >:
> > >
> > > Hi,
> > >
> > > I am trying to use streaming rollup expression to aggregate the sales
> > > values over week. Here is the query:
> > >
> > > curl http://localhost:8983/solr/metrics_data/stream -d 'expr=rollup(
> > >   search(metrics_data, q=id:123, fl="week_no,sales,qty", qt="/export",
> > > sort="week_no desc"),
> > >  over="week",
> > >   sum(sales),
> > >   sum(qty)
> > > )'
> > >
> > > But I am getting exception like:
> > >
> > > {
> > > "result-set": {
> > > "docs": [{
> > > "EXCEPTION": null,
> > > "EOF": true,
> > > "RESPONSE_TIME": 169
> > > }]
> > > }
> > > }
> > >
> > > The reason being some of the documents are having null as sales. One
> > option
> > > is to wrap the search with select expression
> > > with replace(field,null,withValue=0). Is there any other way for rollup
> > to
> > > ignore those docs which has some fields as null?
> > >
> > > Thanks in advance
> >
> >
>


Re: Streaming Expressions - gatherNodes

2018-09-13 Thread Kojo
Joel,
The result of the expressions are right. The problem was that I was
expecting wrong return values.

I was not using the "scatter" parameter, and by default the gatterNodes
emits only the leaf nodes. So while I increased the input to the function,
it was resolving the nodes at the level zero only (branches), not showing
any leaves result. When I decrease the input, some nodes didn´r resolve at
the branches leaves, resolving at the leaves, counting some nodes.

I will perform more tests, but I think that everything is working fine.

Thanks for your patience and sorry for the false alarm.

Koji


Em qui, 13 de set de 2018 às 14:22, Kojo  escreveu:

> I can do that.
> I tell you when I open the Jira.
>
>
>
> Em qui, 13 de set de 2018 às 14:05, Joel Bernstein 
> escreveu:
>
>> I'll have to take a look and see if I can reproduce this exact behavior.
>> Let's create a jira ticket so we can track the discussion.
>>
>>
>> Joel Bernstein
>> http://joelsolr.blogspot.com/
>>
>>
>> On Thu, Sep 13, 2018 at 1:03 PM Kojo  wrote:
>>
>> > Same query feeding 25000 tuples to gatherNodes:
>> >
>> > gatherNodes(graph_auxilios,
>> >   search(graph_auxilios, zkHost="localhost:9983",qt="/select",
>> rows=25000,
>> > q=*:*,  fl="numero_processo", sort="numero_processo asc"),
>> >   walk="numero_processo->projeto_pai",
>> >   gather="numero_processo",
>> > )
>> >
>> >
>> > About 350 nodes returned. This is the last one:
>> >
>> >   {
>> > "node": "97/13422-6",
>> > "collection": "graph_auxilios",
>> > "field": "numero_processo",
>> > "level": 1
>> >   },
>> >   {
>> > "EOF": true,
>> > "RESPONSE_TIME": 3996
>> >   }
>> > ]
>> >   }
>> > }
>> >
>> >
>> >
>> > This is the log:
>> >
>> > INFO  - 2018-09-13 16:55:05.949; [c:graph_auxilios s:shard6 r:core_node3
>> > x:graph_auxilios_shard6_replica1] org.apache.solr.core.SolrCore;
>> > [graph_auxilios_shard6_replica1]  webapp=/solr path=/export
>> >
>> >
>> params={q={!terms+f%3Dprojeto_pai}95/07551-2,95/07553-5,95/07554-1,95/07555-8,95/07557-0,95/07561-8,95/07571-3,95/07572-0,95/07575-9,95/07583-1,95/07591-4,95/07592-0,95/07595-0,95/07619-6,95/07624-0,95/07631-6,95/07633-9,95/07635-1,95/07637-4,95/07639-7,95/07641-1,95/07646-3,95/07648-6,95/07650-0,95/07652-3,95/07653-0,95/07654-6,95/07660-6,95/07661-2,95/07662-9,95/07665-8,95/07666-4,95/07667-0,95/07668-7,95/07669-3,95/07672-4,95/07673-0,95/07674-7,95/07675-3,95/07682-0,95/07686-5,95/07688-8,95/07689-4,95/07692-5,95/07695-4,95/07698-3,95/07699-0,95/07700-8,95/07706-6,95/07711-0,95/07712-6,95/07717-8,95/07726-7,95/07729-6,95/07730-4,95/07739-1,95/07741-6,95/07745-1,95/07746-8,95/07748-0,95/07750-5,95/07755-7,95/07757-0,95/07759-2,95/07763-0,95/07764-6,95/07765-2,95/07767-5,95/07768-1,95/07769-8,95/07775-8,95/07784-7,95/07789-9,95/07796-5,95/07799-4,95/07802-5,95/07803-1,95/07807-7,95/07812-0,95/07813-7,95/07821-0,95/07824-9,95/07826-1,95/07828-4,95/07830-9,95/07831-5,95/07833-8,95/07834-4,95/07838-0,95/07839-6,95/07840-4,95/07843-3,95/07844-0,95/07849-1,95/07851-6,95/07852-2,95/07853-9,95/07855-1,95/07858-0,95/07868-6,95/07869-2,95/07873-0,95/07874-6,95/07876-9,95/07877-5,95/07881-2,95/07883-5,95/07884-1,95/07885-8,95/07886-4,95/07888-7,95/07892-4,95/07902-0,95/07914-8,95/07915-4,95/07917-7,95/07920-8,95/07922-0,95/07928-9,95/07930-3,95/07949-6,95/07951-0,95/07952-7,95/07960-0,95/07961-6,95/07962-2,95/07969-7,95/07981-7,95/07986-9,95/07997-0,95/08000-0,95/08001-6,95/08003-9,95/08008-0,95/08021-7,95/08034-1,95/08038-7,95/08041-8,95/08042-4,95/08046-0,95/08049-9,95/08050-7,95/08051-3,95/08053-6,95/08056-5,95/08063-1,95/08066-0,95/08070-8,95/08073-7,95/08078-9,95/08080-3,95/08092-1,95/08099-6,95/08105-6,95/08106-2,95/08108-5,95/08114-5,95/08118-0,95/08119-7,95/08120-5,95/08123-4,95/08128-6,95/08132-3,95/08134-6,95/08135-2,95/08155-3,95/08156-0,95/08158-2,95/08164-2,95/08165-9,95/08166-5,95/08169-4,95/08171-9,95/08184-3,95/08188-9,95/08191-0,95/08209-6,95/08218-5,95/08235-7,95/08248-1,95/08250-6,95/08254-1,95/08273-6,95/08275-9,95/08277-1,95/08280-2,95/08281-9,95/08287-7,95/08294-3,95/08297-2,95/08301-0,95/08304-9,95/08333-9,95/08336-8,95/08339-7,95/08342-8,95/08343-4,95/08346-3,95/08382-0,95/08386-5,95/08414-9,95/08415-5,95/08453-4,95/08471-2,95/08472-9,95/08475-8,95/08503-1,95/08505-4,95/08524-9,95/08560-5,95/08569-2,95/08583-5,95/08592-4,95/08601-3,95/08634-9,95/08640-9,95/08654-0,95/08706-0,95/087

Re: Streaming Expressions - gatherNodes

2018-09-13 Thread Kojo
I can do that.
I tell you when I open the Jira.



Em qui, 13 de set de 2018 às 14:05, Joel Bernstein 
escreveu:

> I'll have to take a look and see if I can reproduce this exact behavior.
> Let's create a jira ticket so we can track the discussion.
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
>
> On Thu, Sep 13, 2018 at 1:03 PM Kojo  wrote:
>
> > Same query feeding 25000 tuples to gatherNodes:
> >
> > gatherNodes(graph_auxilios,
> >   search(graph_auxilios, zkHost="localhost:9983",qt="/select",
> rows=25000,
> > q=*:*,  fl="numero_processo", sort="numero_processo asc"),
> >   walk="numero_processo->projeto_pai",
> >   gather="numero_processo",
> > )
> >
> >
> > About 350 nodes returned. This is the last one:
> >
> >   {
> > "node": "97/13422-6",
> > "collection": "graph_auxilios",
> > "field": "numero_processo",
> > "level": 1
> >   },
> >   {
> > "EOF": true,
> > "RESPONSE_TIME": 3996
> >   }
> > ]
> >   }
> > }
> >
> >
> >
> > This is the log:
> >
> > INFO  - 2018-09-13 16:55:05.949; [c:graph_auxilios s:shard6 r:core_node3
> > x:graph_auxilios_shard6_replica1] org.apache.solr.core.SolrCore;
> > [graph_auxilios_shard6_replica1]  webapp=/solr path=/export
> >
> >
> params={q={!terms+f%3Dprojeto_pai}95/07551-2,95/07553-5,95/07554-1,95/07555-8,95/07557-0,95/07561-8,95/07571-3,95/07572-0,95/07575-9,95/07583-1,95/07591-4,95/07592-0,95/07595-0,95/07619-6,95/07624-0,95/07631-6,95/07633-9,95/07635-1,95/07637-4,95/07639-7,95/07641-1,95/07646-3,95/07648-6,95/07650-0,95/07652-3,95/07653-0,95/07654-6,95/07660-6,95/07661-2,95/07662-9,95/07665-8,95/07666-4,95/07667-0,95/07668-7,95/07669-3,95/07672-4,95/07673-0,95/07674-7,95/07675-3,95/07682-0,95/07686-5,95/07688-8,95/07689-4,95/07692-5,95/07695-4,95/07698-3,95/07699-0,95/07700-8,95/07706-6,95/07711-0,95/07712-6,95/07717-8,95/07726-7,95/07729-6,95/07730-4,95/07739-1,95/07741-6,95/07745-1,95/07746-8,95/07748-0,95/07750-5,95/07755-7,95/07757-0,95/07759-2,95/07763-0,95/07764-6,95/07765-2,95/07767-5,95/07768-1,95/07769-8,95/07775-8,95/07784-7,95/07789-9,95/07796-5,95/07799-4,95/07802-5,95/07803-1,95/07807-7,95/07812-0,95/07813-7,95/07821-0,95/07824-9,95/07826-1,95/07828-4,95/07830-9,95/07831-5,95/07833-8,95/07834-4,95/07838-0,95/07839-6,95/07840-4,95/07843-3,95/07844-0,95/07849-1,95/07851-6,95/07852-2,95/07853-9,95/07855-1,95/07858-0,95/07868-6,95/07869-2,95/07873-0,95/07874-6,95/07876-9,95/07877-5,95/07881-2,95/07883-5,95/07884-1,95/07885-8,95/07886-4,95/07888-7,95/07892-4,95/07902-0,95/07914-8,95/07915-4,95/07917-7,95/07920-8,95/07922-0,95/07928-9,95/07930-3,95/07949-6,95/07951-0,95/07952-7,95/07960-0,95/07961-6,95/07962-2,95/07969-7,95/07981-7,95/07986-9,95/07997-0,95/08000-0,95/08001-6,95/08003-9,95/08008-0,95/08021-7,95/08034-1,95/08038-7,95/08041-8,95/08042-4,95/08046-0,95/08049-9,95/08050-7,95/08051-3,95/08053-6,95/08056-5,95/08063-1,95/08066-0,95/08070-8,95/08073-7,95/08078-9,95/08080-3,95/08092-1,95/08099-6,95/08105-6,95/08106-2,95/08108-5,95/08114-5,95/08118-0,95/08119-7,95/08120-5,95/08123-4,95/08128-6,95/08132-3,95/08134-6,95/08135-2,95/08155-3,95/08156-0,95/08158-2,95/08164-2,95/08165-9,95/08166-5,95/08169-4,95/08171-9,95/08184-3,95/08188-9,95/08191-0,95/08209-6,95/08218-5,95/08235-7,95/08248-1,95/08250-6,95/08254-1,95/08273-6,95/08275-9,95/08277-1,95/08280-2,95/08281-9,95/08287-7,95/08294-3,95/08297-2,95/08301-0,95/08304-9,95/08333-9,95/08336-8,95/08339-7,95/08342-8,95/08343-4,95/08346-3,95/08382-0,95/08386-5,95/08414-9,95/08415-5,95/08453-4,95/08471-2,95/08472-9,95/08475-8,95/08503-1,95/08505-4,95/08524-9,95/08560-5,95/08569-2,95/08583-5,95/08592-4,95/08601-3,95/08634-9,95/08640-9,95/08654-0,95/08706-0,95/08707-6,95/08713-6,95/08717-1,95/08725-4,95/08737-2,95/08746-1,95/08752-1,95/08775-1,95/08798-1,95/08806-4,95/08823-6,95/08856-1,95/08860-9,95/08865-0,95/08872-7,95/08875-6,95/08881-6,95/08908-1,95/08917-0,95/08918-7,95/08955-0,95/08969-0,95/08973-8,95/08990-0,95/09023-3,95/09045-7,95/09058-1,95/09068-7,95/09080-7,95/09085-9,95/09101-4,95/09108-9,95/09135-6,95/09156-3,95/09159-2,95/09160-0,95/09167-5,95/09202-5,95/09207-7,95/09222-6,95/09260-5,95/09263-4,95/09269-2,95/09290-1,95/09305-9,95/09311-9,95/09319-0,95/09329-5,95/09330-3,95/09361-6,95/09379-2,95/09382-3,95/09417-1,95/09432-0,95/09440-3,95/09531-9,95/09538-3,95/09545-0,95/09552-6,95/09557-8,95/09576-2,95/09577-9,95/09584-5,95/09590-5,95/09599-2,95/09607-5,95/09608-1,95/09625-3,95/09637-1,95/09660-3,95/09716-9,95/09727-0,95/09744-2,95/09755-4,95/09756-0,95/09777-8,95/09780-9,95/09795-6,95/09809-7,95/09810-5,95/09812-8,95/09825-2

Re: Streaming Expressions - gatherNodes

2018-09-13 Thread Kojo
00554-9,96/00568-0,96/00589-7,96/00592-8,96/00596-3,96/00603-0,96/00604-6,96/00612-9,96/00613-5,96/00621-8,96/00640-2,96/00641-9,96/00657-2,96/00675-0,96/00699-7,96/00700-5,96/00703-4,96/00753-1,96/00756-0,96/00768-9,96/00769-5,96/00770-3,96/00773-2,96/00779-0,96/00782-1,96/00793-3,96/00829-8,96/00830-6,96/00832-9,96/00837-0,96/00852-0,96/00873-7=false=numero_processo,projeto_pai=numero_processo+asc,projeto_pai+asc=json=2.2}
hits=4 status=0 QTime=1


Reading your last msg again, I am not sure that you were asking a different
question than this logs I sent now.

I can run different tests if you need more information.

Thanks for your help Joel.




Em qui, 13 de set de 2018 às 13:50, Joel Bernstein 
escreveu:

> I see that the hits=0 in this log request. Are there log requests that show
> results found for one of these queries?
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
>
> On Thu, Sep 13, 2018 at 10:15 AM Kojo  wrote:
>
> > I have just run this expression:
> >
> >
> > gatherNodes(graph_auxilios,
> >   search(graph_auxilios, zkHost="localhost:9983",qt="/select",
> rows=3,
> > q=*:*,  fl="numero_processo", sort="numero_processo asc"),
> >   walk="numero_processo->projeto_pai",
> >   gather="numero_processo",
> > )
> >
> >
> >
> > This is the response:
> >
> > { "result-set": { "docs": [ { "EOF": true, "RESPONSE_TIME": 5091 } ] } }
> >
> >
> >
> > Many logs until it stops quietly. This is the last line on
> > /opt/solr-6.6.2/example/cloud/node1/logs
> >
> > INFO  - 2018-09-13 14:09:20.783; [c:graph_auxilios s:shard6 r:core_node3
> > x:graph_auxilios_shard6_replica1] org.apache.solr.core.SolrCore;
> > [graph_auxilios_shard6_replica1]  webapp=/solr path=/export
> >
> >
> params={q={!terms+f%3Dprojeto_pai}16/25417-0,16/25418-6,16/25423-0,16/25425-2,16/25426-9,16/25429-8,16/25436-4,16/25437-0,16/25441-8,16/25442-4,16/25446-0,16/25448-2,16/25450-7,16/25452-0,16/25457-1,16/25459-4,16/25463-1,16/25464-8,16/25467-7,16/25468-3,16/25469-0,16/25471-4,16/25474-3,16/25475-0,16/25476-6,16/25478-9,16/25480-3,16/25483-2,16/25484-9,16/25485-5,16/25486-1,16/25489-0,16/25490-9,16/25498-0,16/25499-6,16/25500-4,16/25501-0,16/25502-7,16/25504-0,16/25506-2,16/25509-1,16/25520-5,16/25521-1,16/25524-0,16/25526-3,16/25528-6,16/25529-2,16/25531-7,16/25532-3,16/25533-0,16/25535-2,16/25537-5,16/25538-1,16/25540-6,16/25544-1,16/25546-4,16/25549-3,16/25551-8,16/25556-0,16/25557-6,16/25562-0,16/25563-6,16/25565-9,16/25573-1,16/25574-8,16/25583-7,16/25590-3,16/25594-9,16/25599-0,16/25600-9,16/25604-4,16/25609-6,16/25610-4,16/25611-0,16/25613-3,16/25615-6,16/25617-9,16/25619-1,16/25621-6,16/25626-8,16/25632-8,16/25634-0,16/25635-7,16/25637-0,16/25639-2,16/25642-3,16/25644-6,16/25647-5,16/25651-2,16/25652-9,16/25658-7,16/25659-3,16/25661-8,16/25669-9,16/25676-5,16/25680-2,16/25681-9,16/25682-5,16/25683-1,16/25685-4,16/25686-0,16/25687-7,16/25698-9,16/25703-2,16/25705-5,16/25706-1,16/25708-4,16/25717-3,16/25729-1,16/25730-0,16/25734-5,16/25735-1,16/25737-4,16/25745-7,16/25747-0,16/25749-2,16/25750-0,16/25751-7,16/25755-2,16/25764-1,16/25766-4,16/25769-3,16/25771-8,16/25776-0,16/25778-2,16/25779-9,16/25782-0,16/25783-6,16/25784-2,16/25785-9,16/25791-9,16/25793-1,16/25795-4,16/25798-3,16/25800-8,16/25806-6,16/25810-3,16/25813-2,16/25817-8,16/25821-5,16/25824-4,16/25826-7,16/25831-0,16/25832-7,16/25833-3,16/25836-2,16/25837-9,16/25840-0,16/25845-1,16/25847-4,16/25851-1,16/25853-4,16/25855-7,16/25861-7,16/25864-6,16/25865-2,16/25866-9,16/25867-5,16/25868-1,16/25870-6,16/25876-4,16/25878-7,16/25882-4,16/25883-0,16/25884-7,16/25891-3,16/25893-6,16/25894-2,16/25895-9,16/25896-5,16/25898-8,16/25900-2,16/25901-9,16/25905-4,16/25907-7,16/25910-8,16/25913-7,16/25914-3,16/25915-0,16/25916-6,16/25917-2,16/25919-5,16/25920-3,16/25925-5,16/25926-1,16/25928-4,16/25929-0,16/25931-5,16/25933-8,16/25935-0,16/25938-0,16/25945-6,16/25946-2,16/25947-9,16/25952-2,16/25953-9,16/25955-1,16/25959-7,16/25962-8,16/25963-4,16/25964-0,16/25965-7,16/25967-0,16/25968-6,16/25969-2,16/25970-0,16/25971-7,16/25979-8,16/25984-1,16/25985-8,16/25986-4,16/25987-0,16/25988-7,16/25990-1,16/25996-0,16/25999-9,16/26000-5,16/26001-1,16/26003-4,16/26011-7,16/26012-3,16/26013-0,16/26014-6,16/26016-9,16/26022-9,16/26024-1,16/26026-4,16/26028-7,16/26029-3,16/26030-1,16/26031-8,16/26032-4,16/26033-0,16/26034-7,16/26035-3,16/26040-7,16/26041-3,16/26043-6,16/26044-2,16/26050-2,16/26054-8,16/26055-4,16/26060-8,16/26061-4,16/26064-3,16/26069-5,16/26071-0,16/26076-1,16/26080-9,16/26081-5,16/26082-1,16/26083-8,16/26095-6,16/26098-5,16/26101-6,16/26103-9,16/26108-0,16/26110-5,16/26113-4,16/26117-0,16/26118-6,16/26121-7,16/26123-0,16/26124-6,16/26130-6,16/26132-9,16/26135-8,16/26140-1,16/26145-3,

Re: Streaming Expressions - gatherNodes

2018-09-13 Thread Kojo
0,16/50178-9,16/50180-3,16/50183-2,16/50185-5,16/50186-1,16/50187-8,16/50188-4,16/50189-0,16/50192-1,16/50193-8,16/50195-0,16/50196-7,16/50197-3,16/50198-0,16/50199-6,16/50200-4,16/50201-0,16/50204-0,16/50205-6=false=numero_processo,projeto_pai=numero_processo+asc,projeto_pai+asc=json=2.2}
hits=0 status=0 QTime=2



Four nodes seems to stop logging at the same time when it returns empty
result.

Em qui, 13 de set de 2018 às 10:34, Joel Bernstein 
escreveu:

> That's odd behavior. What do the logs look like? This will produce a series
> of queries against the projects collection. Are you seeing those in the
> logs? Any errors?
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
>
> On Thu, Sep 13, 2018 at 9:25 AM Kojo  wrote:
>
> > Hi,
> >
> > If I try to feed gatherNodes with more than 25000 tuples, it gives me
> empty
> > result-set.
> >
> > gatherNodes(projects,
> >   search(projects, zkHost="localhost:9983",qt="/select", rows=3,
> > q=*:*,  fl="id", sort="id asc"),
> >   walk="id->parent_id",
> >   gather="id",
> > )
> >
> > Response:
> > { "result-set": { "docs": [ { "EOF": true, "RESPONSE_TIME": 4499 } ] } }
> >
> >
> > This works:
> > search(graph_auxilios, zkHost="localhost:9983",qt="/select", rows=3,
> > q=*:*,  fl="id", sort="id asc"),
> >
> > If I feed the gatherNodes with less tuples, lets say, 25000, it works.
> >
> > Is it an expected behaviour?
> >
> > I am on Solr 6.6. And my box has 128Gb RAM. The only infra that is still
> > not set right is that I am using standalone Zookeper.
> >
>


Streaming Expressions - gatherNodes

2018-09-13 Thread Kojo
Hi,

If I try to feed gatherNodes with more than 25000 tuples, it gives me empty
result-set.

gatherNodes(projects,
  search(projects, zkHost="localhost:9983",qt="/select", rows=3,
q=*:*,  fl="id", sort="id asc"),
  walk="id->parent_id",
  gather="id",
)

Response:
{ "result-set": { "docs": [ { "EOF": true, "RESPONSE_TIME": 4499 } ] } }


This works:
search(graph_auxilios, zkHost="localhost:9983",qt="/select", rows=3,
q=*:*,  fl="id", sort="id asc"),

If I feed the gatherNodes with less tuples, lets say, 25000, it works.

Is it an expected behaviour?

I am on Solr 6.6. And my box has 128Gb RAM. The only infra that is still
not set right is that I am using standalone Zookeper.


Re: Solr Json Facet

2018-05-09 Thread Kojo
Only for the records, I will describe here what I did to solve this
problem. This is specific for those who are using python/requests and Solr
json facet api.


I would like to ask another question regarding json facet.
>
> With GET method, i was used to use many fq on the same query, each one
> with it's own tag. It was working wondefully.
>
> With POST method, to post more than one fq parameter is a little
> complicated, so I am joining all queries in one fq with all the tags. When
> I select the first facet everything seems to be ok, but when I select the
> second facet it is "cleaning" the first filter for the facets which shows
> all the original values for this second facet, even though the result-set
> is filtering as expected. I will make more tests to understand the
> mechanics of this, but if someone has some advise on this subject I
> appreciate a lot.
>


I was not aware how to POST data with same key on Python. It was due to my
unfamiliarity of the details of the protocol and the libs.
In python/requests library, one way to send data over POST is to use a
dictionary, that has a unique key by design.

But I realise that I can send data as a list of tuples like this:
[('q','*:*'), ('fl','*'), ('json.facet',facet_fields), ('fq', 'fq_1'),
'fq', 'fq_2'),  'fq', 'fq_3')]

The facet layer of my system now works entirely  using solr json facet api
over http post.












>
>
>
>
> 2018-05-08 23:54 GMT-03:00 Yonik Seeley <ysee...@gmail.com>:
>
>> Looks like some sort of proxy server inbetween the python client and
>> solr server.
>> I would still check first if the output from the python client is
>> correctly escaped/encoded HTTP.
>>
>> One easy way is to use netcat to pretend to be a server:
>> $ nc -l 8983
>> And then send point the python client at that and send the request.
>>
>> -Yonik
>>
>>
>> On Tue, May 8, 2018 at 9:17 PM, Kojo <rbsnk...@gmail.com> wrote:
>> > Thank you all. I tried escaping but still not working
>> >
>> > Yonik, I am using Python Requests. It works if my fq is a single word,
>> even
>> > if I use double quotes on this single word without escaping.
>> >
>> > This is the HTTP response:
>> >
>> > response.content
>> > 
>> > '> > 2.0//EN">\n\n400 Bad
>> > Request\n\nBad Request\nYour browser
>> sent
>> > a request that this server could not understand.> > />\n\n\nApache/2.2.15 (Oracle) Server at leydenh Port
>> > 80\n\n'
>> >
>> >
>> > Thank you,
>> >
>> >
>> >
>> > 2018-05-08 18:46 GMT-03:00 Yonik Seeley <ysee...@gmail.com>:
>> >
>> >> On Tue, May 8, 2018 at 1:36 PM, Kojo <rbsnk...@gmail.com> wrote:
>> >> > If I tag the fq query and I query for a simple word it works fine
>> too.
>> >> But
>> >> > if query a multi word with space in the middle it breaks:
>> >>
>> >> Most likely the full query is not getting to Solr because of an HTTP
>> >> protocol error (i.e. the request is not encoded correctly).
>> >> How are you sending your request to Solr (with curl, or with some other
>> >> method?)
>> >>
>> >> -Yonik
>> >>
>>
>
>


Re: Solr Json Facet

2018-05-08 Thread Kojo
Everything working now. The code is not that clean and I am rewriting, so I
don't know exactly what was wrong, but something malformed.

I would like to ask another question regarding json facet.

With GET method, i was used to use many fq on the same query, each one with
it's own tag. It was working wondefully.

With POST method, to post more than one fq parameter is a little
complicated, so I am joining all queries in one fq with all the tags. When
I select the first facet everything seems to be ok, but when I select the
second facet it is "cleaning" the first filter for the facets which shows
all the original values for this second facet, even though the result-set
is filtering as expected. I will make more tests to understand the
mechanics of this, but if someone has some advise on this subject I
appreciate a lot.

Thank you,





2018-05-08 23:54 GMT-03:00 Yonik Seeley <ysee...@gmail.com>:

> Looks like some sort of proxy server inbetween the python client and
> solr server.
> I would still check first if the output from the python client is
> correctly escaped/encoded HTTP.
>
> One easy way is to use netcat to pretend to be a server:
> $ nc -l 8983
> And then send point the python client at that and send the request.
>
> -Yonik
>
>
> On Tue, May 8, 2018 at 9:17 PM, Kojo <rbsnk...@gmail.com> wrote:
> > Thank you all. I tried escaping but still not working
> >
> > Yonik, I am using Python Requests. It works if my fq is a single word,
> even
> > if I use double quotes on this single word without escaping.
> >
> > This is the HTTP response:
> >
> > response.content
> > 
> > ' > 2.0//EN">\n\n400 Bad
> > Request\n\nBad Request\nYour browser
> sent
> > a request that this server could not understand. > />\n\n\nApache/2.2.15 (Oracle) Server at leydenh Port
> > 80\n\n'
> >
> >
> > Thank you,
> >
> >
> >
> > 2018-05-08 18:46 GMT-03:00 Yonik Seeley <ysee...@gmail.com>:
> >
> >> On Tue, May 8, 2018 at 1:36 PM, Kojo <rbsnk...@gmail.com> wrote:
> >> > If I tag the fq query and I query for a simple word it works fine too.
> >> But
> >> > if query a multi word with space in the middle it breaks:
> >>
> >> Most likely the full query is not getting to Solr because of an HTTP
> >> protocol error (i.e. the request is not encoded correctly).
> >> How are you sending your request to Solr (with curl, or with some other
> >> method?)
> >>
> >> -Yonik
> >>
>


Re: Solr Json Facet

2018-05-08 Thread Kojo
Thank you all. I tried escaping but still not working

Yonik, I am using Python Requests. It works if my fq is a single word, even
if I use double quotes on this single word without escaping.

This is the HTTP response:

response.content

'\n\n400 Bad
Request\n\nBad Request\nYour browser sent
a request that this server could not understand.\n\n\nApache/2.2.15 (Oracle) Server at leydenh Port
80\n\n'


Thank you,



2018-05-08 18:46 GMT-03:00 Yonik Seeley <ysee...@gmail.com>:

> On Tue, May 8, 2018 at 1:36 PM, Kojo <rbsnk...@gmail.com> wrote:
> > If I tag the fq query and I query for a simple word it works fine too.
> But
> > if query a multi word with space in the middle it breaks:
>
> Most likely the full query is not getting to Solr because of an HTTP
> protocol error (i.e. the request is not encoded correctly).
> How are you sending your request to Solr (with curl, or with some other
> method?)
>
> -Yonik
>


Solr Json Facet

2018-05-08 Thread Kojo
Hello,
recently I have changed the way I get facet data from Solr. I was using GET
method on request but due to the limit of the query I changed to POST
method.

Bellow is a sample of the data I send to Solr, in order to get facets. But
there is something here that I don´t understand.

If I do not tag the fq query, it woks fine:
{'q':'*:*', 'fl': '*', 'fq':'city_colaboration:"College Station"',
'json.facet': '{city_colaboration:{type:terms, field: city_colaboration
,limit:5000}}'}

If I tag the fq query and I query for a simple word it works fine too. But
if query a multi word with space in the middle it breaks:

{'q':'*:*', 'fl': '*',
'fq':'{!tag=city_colaboration_tag}city_colaboration:"College
Station"', 'json.facet': '{city_colaboration:{type:terms, field:
city_colaboration ,limit:5000, domain:{excludeTags:city_
colaboration_tag}}}'}


All of this works fine for GET method, but breks on POST method.


Below is the portion of the log. I really appreciate your help.

Regards,
Koji



01:49
ERROR true
RequestHandlerBase
org.apache.solr.common.SolrException: org.apache.solr.search.SyntaxError:
Cannot parse 'city_colaboration:"College': Lexical error at line 1,
column 34. Encountered:  after : "\"College"
org.apache.solr.common.SolrException: org.apache.solr.search.SyntaxError:
Cannot parse 'cidade_colaboracao_exact:"College': Lexical error at line 1,
column 34.  Encountered:  after : "\"College"
at org.apache.solr.handler.component.QueryComponent.
prepare(QueryComponent.java:219)
at org.apache.solr.handler.component.SearchHandler.handleRequestBody(
SearchHandler.java:270)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(
RequestHandlerBase.java:173)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2477)
at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:723)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:529)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(
SolrDispatchFilter.java:361)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(
SolrDispatchFilter.java:305)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.
doFilter(ServletHandler.java:1691)
at org.eclipse.jetty.servlet.ServletHandler.doHandle(
ServletHandler.java:582)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(
ScopedHandler.java:143)
at org.eclipse.jetty.security.SecurityHandler.handle(
SecurityHandler.java:548)
at org.eclipse.jetty.server.session.SessionHandler.
doHandle(SessionHandler.java:226)
at org.eclipse.jetty.server.handler.ContextHandler.
doHandle(ContextHandler.java:1180)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
at org.eclipse.jetty.server.session.SessionHandler.
doScope(SessionHandler.java:185)
at org.eclipse.jetty.server.handler.ContextHandler.
doScope(ContextHandler.java:1112)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(
ScopedHandler.java:141)
at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(
ContextHandlerCollection.java:213)
at org.eclipse.jetty.server.handler.HandlerCollection.
handle(HandlerCollection.java:119)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(
HandlerWrapper.java:134)
at org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(
RewriteHandler.java:335)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(
HandlerWrapper.java:134)
at org.eclipse.jetty.server.Server.handle(Server.java:534)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
at org.eclipse.jetty.server.HttpConnection.onFillable(
HttpConnection.java:251)
at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(
AbstractConnection.java:273)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
at org.eclipse.jetty.io.SelectChannelEndPoint$2.run(
SelectChannelEndPoint.java:93)
at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.
executeProduceConsume(ExecuteProduceConsume.java:303)
at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.
produceConsume(ExecuteProduceConsume.java:148)
at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(
ExecuteProduceConsume.java:136)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(
QueuedThreadPool.java:671)
at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(
QueuedThreadPool.java:589)
at java.lang.Thread.run(Thread.java:748)


Re: Schemaless mode question

2018-04-17 Thread Kojo
 Shawn,
I first deleted the collection from the admin interface. It didn´t work.

When I deleted direct on command line it worked:
 /opt/solr-6.6.2/bin/solr delete -c 


Thanks for the advice on using schemaless on production. I understand the
potential problems, so I will first create schema automagically on
schemaless mode, download it adjust and upload to zk as described on the
documentation.

Thanks,
Robson




2018-04-17 11:49 GMT-03:00 Shawn Heisey <apa...@elyograg.org>:

> On 4/17/2018 8:15 AM, Kojo wrote:
> > I am trying schemaless mode and it seems to works very nice, and there is
> > no overhead to write a custom schema for each type of collection that we
> > need to index.
> > However we are facing a strange problem. Once we have created a
> collection
> > and indexed data on that collection, if we need to make some change on
> data
> > (change data type), even if we delete the collection, restart all solr
> > instances, create the collection again, the new auto schema is not
> > recreated and the former auto generated schema is still there.
> >
> > The only workaround that i have found to solve this, is to create a new
> > collection with a different name.
> >
> > Is this a known bug on Solr 6.6 or am I missing something?
>
> We recommend NOT using schemaless mode in production.  It is not always
> able to make the right guess for the fieldType of the data it
> encounters.  In production, it's generally better to have Solr throw an
> error when it encounters unknown fields, and then for you to manually
> adjust the schema with the fieldType that is correct for the new field.
> If the wrong guess is made and you have to change the schema, then you
> will have to re-index.
>
> Without knowing precisely what steps/commands/requests you used for all
> of the actions you have described, it is very difficult to know if
> there's a problem with Solr or if one of the steps taken was incorrect.
> Can you fill in the details?
>
> Thanks,
> Shawn
>
>


Re: Schemaless mode question

2018-04-17 Thread Kojo
I have just deleted using command line and worked as expected!



2018-04-17 11:15 GMT-03:00 Kojo <rbsnk...@gmail.com>:

> Hi all,
>
> I am trying schemaless mode and it seems to works very nice, and there is
> no overhead to write a custom schema for each type of collection that we
> need to index.
> However we are facing a strange problem. Once we have created a collection
> and indexed data on that collection, if we need to make some change on data
> (change data type), even if we delete the collection, restart all solr
> instances, create the collection again, the new auto schema is not
> recreated and the former auto generated schema is still there.
>
> The only workaround that i have found to solve this, is to create a new
> collection with a different name.
>
> Is this a known bug on Solr 6.6 or am I missing something?
>
> Thanks in advance,
>
>


Schemaless mode question

2018-04-17 Thread Kojo
Hi all,

I am trying schemaless mode and it seems to works very nice, and there is
no overhead to write a custom schema for each type of collection that we
need to index.
However we are facing a strange problem. Once we have created a collection
and indexed data on that collection, if we need to make some change on data
(change data type), even if we delete the collection, restart all solr
instances, create the collection again, the new auto schema is not
recreated and the former auto generated schema is still there.

The only workaround that i have found to solve this, is to create a new
collection with a different name.

Is this a known bug on Solr 6.6 or am I missing something?

Thanks in advance,


Re: Solr cloud schema and schemaless

2018-04-04 Thread Kojo
Many thanks Erick.
I think that we found the issue regarding schemaless. The origin file has
to follow a specific format and we were trying to index with a non solr xml
standard.

Also, thanks for the advice of the field type. These schemaless collections
will all come from the same source, hence normalized. I hope...

Koji



2018-04-04 0:57 GMT-03:00 Erick Erickson <erickerick...@gmail.com>:

> The schema mode is _per collection_, not per node. So there's no trouble
> mixing
> replicas from collection A running schema model 1 with replicas from
> collection B
> running a different schema model.
>
> That said, schemaless is _not_ recommended for production unless you have
> total control over the ETL chain and can guarantee that documents conform
> to
> some standard. Schemaless does its best, but it guesses based on the
> first time it
> sees a field. So if the first doc has field X with a value of 1, it
> infers that this field
> is an int type. If doc2 has a value of 1.0, the doc fails with a parsing
> error.
>
> FYI,
> Erick
>
> On Tue, Apr 3, 2018 at 2:39 PM, Kojo <rbsnk...@gmail.com> wrote:
> > Hi Solrs,
> > We have a Solr cloud running in three nodes.
> > Five collections are running in schema mode and we would like to create
> > another collection running schemalles.
> >
> > Does it fit all together schema and schemales on the same nodes?
> >
> > I am not sure, because on this page it starts solr in schemalles mode
> but I
> > start Solr cloud whithout this option.
> >
> > https://lucene.apache.org/solr/guide/6_6/schemaless-mode.html
> >
> > bin/solr start -e schemaless
> >
> >
> >
> > Thank you all!
>


Solr cloud schema and schemaless

2018-04-03 Thread Kojo
Hi Solrs,
We have a Solr cloud running in three nodes.
Five collections are running in schema mode and we would like to create
another collection running schemalles.

Does it fit all together schema and schemales on the same nodes?

I am not sure, because on this page it starts solr in schemalles mode but I
start Solr cloud whithout this option.

https://lucene.apache.org/solr/guide/6_6/schemaless-mode.html

bin/solr start -e schemaless



Thank you all!


Re: parallel - cartesianProduct

2018-01-29 Thread Kojo
Joel,
The Jira is created:
https://issues.apache.org/jira/browse/SOLR-11922

I hope it helps.

Thank you very much.




2018-01-29 13:03 GMT-02:00 Joel Bernstein <joels...@gmail.com>:

> This looks like a bug in the CartesianProductStream. It's going to have be
> fixed before parallel cartesian products can be run. Feel free to create a
> jira for this.
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Mon, Jan 29, 2018 at 9:58 AM, Kojo <rbsnk...@gmail.com> wrote:
>
> > Hi solr-users!
> > I have a Streaming Expression which joins two search SE, one of them is
> > evaluated on a cartesianProduct SE.
> > I´am trying to run that in parallel mode but it does not work.
> >
> >
> > Trying a very simple parallel I can see that it works:
> >
> > parallel(
> >   search(
> >
> >
> >
> > But this one I´m trying to run, doesn´t works:
> >
> > parallel(
> > rollup(
> > sort(
> > hashJoin(
> >   search(
> >   hashed=cartesianProduct(
> > search(
> >
> >
> >
> > The simplified version of the above, doesn´t works too:
> >
> > parallel(
> >cartesianProduct(
> > search(
> >
> >
> > The error is bellow, do you have any hint on how can I fix the
> expression?
> >
> > Thank you.
> >
> >
> >
> > java.io.IOException: java.lang.NullPointerException
> > at
> > org.apache.solr.client.solrj.io.stream.ParallelStream.constructStreams(
> > ParallelStream.java:277)
> > at
> > org.apache.solr.client.solrj.io.stream.CloudSolrStream.
> > open(CloudSolrStream.java:305)
> > at
> > org.apache.solr.client.solrj.io.stream.ExceptionStream.
> > open(ExceptionStream.java:51)
> > at
> > org.apache.solr.handler.StreamHandler$TimerStream.
> > open(StreamHandler.java:535)
> > at
> > org.apache.solr.client.solrj.io.stream.TupleStream.
> > writeMap(TupleStream.java:83)
> > at
> > org.apache.solr.response.JSONWriter.writeMap(
> JSONResponseWriter.java:547)
> > at
> > org.apache.solr.response.TextResponseWriter.writeVal(
> > TextResponseWriter.java:193)
> > at
> > org.apache.solr.response.JSONWriter.writeNamedListAsMapWithDups(
> > JSONResponseWriter.java:209)
> > at
> > org.apache.solr.response.JSONWriter.writeNamedList(
> > JSONResponseWriter.java:325)
> > at
> > org.apache.solr.response.JSONWriter.writeResponse(
> > JSONResponseWriter.java:120)
> > at
> > org.apache.solr.response.JSONResponseWriter.write(
> > JSONResponseWriter.java:71)
> > at
> > org.apache.solr.response.QueryResponseWriterUtil.writeQueryResponse(
> > QueryResponseWriterUtil.java:65)
> > at
> > org.apache.solr.servlet.HttpSolrCall.writeResponse(
> HttpSolrCall.java:809)
> > at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:538)
> > at
> > org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> > SolrDispatchFilter.java:361)
> > at
> > org.apache.solr.servlet.SolrDispatchFilter.doFilter(
> > SolrDispatchFilter.java:305)
> > at
> > org.eclipse.jetty.servlet.ServletHandler$CachedChain.
> > doFilter(ServletHandler.java:1691)
> > at
> > org.eclipse.jetty.servlet.ServletHandler.doHandle(
> ServletHandler.java:582)
> > at
> > org.eclipse.jetty.server.handler.ScopedHandler.handle(
> > ScopedHandler.java:143)
> > at
> > org.eclipse.jetty.security.SecurityHandler.handle(
> > SecurityHandler.java:548)
> > at
> > org.eclipse.jetty.server.session.SessionHandler.
> > doHandle(SessionHandler.java:226)
> > at
> > org.eclipse.jetty.server.handler.ContextHandler.
> > doHandle(ContextHandler.java:1180)
> > at
> > org.eclipse.jetty.servlet.ServletHandler.doScope(
> ServletHandler.java:512)
> > at
> > org.eclipse.jetty.server.session.SessionHandler.
> > doScope(SessionHandler.java:185)
> > at
> > org.eclipse.jetty.server.handler.ContextHandler.
> > doScope(ContextHandler.java:1112)
> > at
> > org.eclipse.jetty.server.handler.ScopedHandler.handle(
> > ScopedHandler.java:141)
> > at
> > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(
> > ContextHandlerCollection.java:213)
> > at
> > org.eclipse.jetty.server.handler.HandlerCollection.
> > handle(HandlerCollection.java:119)
> > at
> > org.eclipse.jetty.server.handler.HandlerWrap

parallel - cartesianProduct

2018-01-29 Thread Kojo
Hi solr-users!
I have a Streaming Expression which joins two search SE, one of them is
evaluated on a cartesianProduct SE.
I´am trying to run that in parallel mode but it does not work.


Trying a very simple parallel I can see that it works:

parallel(
  search(



But this one I´m trying to run, doesn´t works:

parallel(
rollup(
sort(
hashJoin(
  search(
  hashed=cartesianProduct(
search(



The simplified version of the above, doesn´t works too:

parallel(
   cartesianProduct(
search(


The error is bellow, do you have any hint on how can I fix the expression?

Thank you.



java.io.IOException: java.lang.NullPointerException
at
org.apache.solr.client.solrj.io.stream.ParallelStream.constructStreams(ParallelStream.java:277)
at
org.apache.solr.client.solrj.io.stream.CloudSolrStream.open(CloudSolrStream.java:305)
at
org.apache.solr.client.solrj.io.stream.ExceptionStream.open(ExceptionStream.java:51)
at
org.apache.solr.handler.StreamHandler$TimerStream.open(StreamHandler.java:535)
at
org.apache.solr.client.solrj.io.stream.TupleStream.writeMap(TupleStream.java:83)
at
org.apache.solr.response.JSONWriter.writeMap(JSONResponseWriter.java:547)
at
org.apache.solr.response.TextResponseWriter.writeVal(TextResponseWriter.java:193)
at
org.apache.solr.response.JSONWriter.writeNamedListAsMapWithDups(JSONResponseWriter.java:209)
at
org.apache.solr.response.JSONWriter.writeNamedList(JSONResponseWriter.java:325)
at
org.apache.solr.response.JSONWriter.writeResponse(JSONResponseWriter.java:120)
at
org.apache.solr.response.JSONResponseWriter.write(JSONResponseWriter.java:71)
at
org.apache.solr.response.QueryResponseWriterUtil.writeQueryResponse(QueryResponseWriterUtil.java:65)
at
org.apache.solr.servlet.HttpSolrCall.writeResponse(HttpSolrCall.java:809)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:538)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:361)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:305)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
at
org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
at org.eclipse.jetty.server.Server.handle(Server.java:534)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
at
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
at
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
at
org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
at
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
at
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
at
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NullPointerException
at
org.apache.solr.client.solrj.io.stream.CartesianProductStream.toExpression(CartesianProductStream.java:154)
at
org.apache.solr.client.solrj.io.stream.CartesianProductStream.toExpression(CartesianProductStream.java:134)
at
org.apache.solr.client.solrj.io.stream.CartesianProductStream.toExpression(CartesianProductStream.java:44)
at
org.apache.solr.client.solrj.io.stream.ParallelStream.constructStreams(ParallelStream.java:255)


Re: hashJoin - Multivalued field

2018-01-23 Thread Kojo
I´am sorry, everything is working fine!

2018-01-23 16:44 GMT-02:00 Kojo <rbsnk...@gmail.com>:

> I am trying to solve one problem, exactly as the case described here:
>
> http://lucene.472066.n3.nabble.com/Streaming-expression-API-innerJoin-on-
> multi-valued-field-td4353794.html
>
> I cannot accomplish that on Solr 6.6, my streaming expression returns
> nothing:
>
>
> hashJoin(
>   search(scholarship, zkHost="localhost:9983", q=*:*, fl="p_number",
> sort="p_number asc"),
>   hashed=cartesianProduct(
>   search(articles, zkHost="localhost:9983", q=*:*, fq="processes:[1 TO
> *]", fl="processes, id", sort="id asc"),
>   processes,
>   ),
>   on="p_number=processes"
> )
>
> Both fields are of type string.
>
>
> One strange thing is that if I filter the first query using fq, some
> results appear.
>
> hashJoin(
>   search(scholarship, zkHost="localhost:9983", q=*:*, fl="p_number",
> sort="p_number asc", fq= "sch_id:905 OR sch_id:3487"),
>   hashed=cartesianProduct(
>   search(articles, zkHost="localhost:9983", q=*:*, fq="processes:[1 TO
> *]", fl="processes, id", sort="id asc"),
>   processes,
>   ),
>   on="p_number=processes"
> )
>
>
>
> {
>   "result-set": {
> "docs": [
>   {
> "processes": "00/01011-6",
> "p_number": "00/01011-6",
> "id": "43256"
>   },
>   {
> "processes": "97/13133-4",
> "p_number": "97/13133-4",
> "id": "43256"
>   },
>   {
> "EOF": true,
> "RESPONSE_TIME": 343
>   }
> ]
>   }
> }
>
>
> Can you help me, please?
>


hashJoin - Multivalued field

2018-01-23 Thread Kojo
I am trying to solve one problem, exactly as the case described here:

http://lucene.472066.n3.nabble.com/Streaming-expression-API-innerJoin-on-multi-valued-field-td4353794.html

I cannot accomplish that on Solr 6.6, my streaming expression returns
nothing:


hashJoin(
  search(scholarship, zkHost="localhost:9983", q=*:*, fl="p_number",
sort="p_number asc"),
  hashed=cartesianProduct(
  search(articles, zkHost="localhost:9983", q=*:*, fq="processes:[1 TO
*]", fl="processes, id", sort="id asc"),
  processes,
  ),
  on="p_number=processes"
)

Both fields are of type string.


One strange thing is that if I filter the first query using fq, some
results appear.

hashJoin(
  search(scholarship, zkHost="localhost:9983", q=*:*, fl="p_number",
sort="p_number asc", fq= "sch_id:905 OR sch_id:3487"),
  hashed=cartesianProduct(
  search(articles, zkHost="localhost:9983", q=*:*, fq="processes:[1 TO
*]", fl="processes, id", sort="id asc"),
  processes,
  ),
  on="p_number=processes"
)



{
  "result-set": {
"docs": [
  {
"processes": "00/01011-6",
"p_number": "00/01011-6",
"id": "43256"
  },
  {
"processes": "97/13133-4",
"p_number": "97/13133-4",
"id": "43256"
  },
  {
"EOF": true,
"RESPONSE_TIME": 343
  }
]
  }
}


Can you help me, please?


Re: docValues

2017-11-24 Thread Kojo
Erick,
thanks for explaining the memory aspects.

Regarding the end user perspective, our intention is to provide a first
layer of filtering, where data will be rolled up in some buckets and be
displayed in charts and tables.
When I told about provide access to "full" documents, it was not to display
on the web, but to allow the researcher to download the data so he can dive
into the data with his own tools (R, spss, whatever).

With this in mind, using /select handler is the only solution to get data
with fields other than docValues that I visualized.

Now that I have a little bit more clear that memory will not be hardly
affected if I use docValues, I will start to think about disk usage grow
and how much it impacts the infrastructure.

Thanks again,









2017-11-24 16:16 GMT-02:00 Erick Erickson <erickerick...@gmail.com>:

> Kojo:
>
> bq: My question is, isn´t it to
> expensive in terms of memory consumption to enable docValues on fields that
> I dont need to facet, search etc?
>
> Well, yes and no. The memory consumed is your OS memory space and a
> small bit of control structures on your Java heap. It's a bit scary
> that your _index_ size will increase significantly on disk, but your
> Java heap requirements won't be correspondingly large.
>
> But there's a bigger issue here. Streaming is built to handle very
> large result sets in a map/reduce style form, i.e. subdivide the work
> amongst lots of nodes. If you want to return _all_ the records to the
> user along with description information and the like, what are they
> going to do with them? 10,000,000 rows (small by some streaming
> operations standards) is far too many to, say, display in a browser.
> And it's an anti-pattern to ask for, say, 10,000,000 rows with the
> select handler.
>
> You can page through these results, but it'll take a long time. So
> basically my question is whether this capability is useful enough to
> spend time on. If it is and you are going to return lots of rows
> consider paging through with cursorMark capabilities, see:
> https://lucidworks.com/2013/12/12/coming-soon-to-solr-
> efficient-cursor-based-iteration-of-large-result-sets/
>
> Best,
> Erick
>
> On Fri, Nov 24, 2017 at 9:38 AM, Kojo <rbsnk...@gmail.com> wrote:
> > I Think that I found the solution. After analysis, change from /export
> > request handler to /select request handler in order to obtain other
> fields.
> > I will try that.
> >
> >
> >
> > 2017-11-24 15:15 GMT-02:00 Kojo <rbsnk...@gmail.com>:
> >
> >> Thank you very much for your answer, Shawn.
> >>
> >> That is it, I was looking for another way to include fields non
> docValues
> >> to the filtered result documents.
> >> I can enable docValues to other fields and reindex all if necessary. I
> >> will tell you about the use case, because I am not sure  that I am on
> the
> >> right track.
> >>
> >> As I said before, I am using Streaming Expressions to deal with
> different
> >> collections. Up to this moment, it is decided that we will use this
> >> approach.
> >>
> >> The goal is to provide our users a web interface where they can make
> some
> >> queries. The backend will get Solr data using the Streaming Expressions
> >> rest api and will return rolled up data to the frontend, which will
> display
> >> some charts and aggregated data.
> >> After that, the end user may want to have data used to generate this
> >> aggregated information (not all fields of the filtered documents, but
> the
> >> fields used to aggregate information), combined with some other fields
> >> (title, description of document for example) which are not docValues. As
> >> you said I need to add docValues to then. My question is, isn´t it to
> >> expensive in terms of memory consumption to enable docValues on fields
> that
> >> I dont need to facet, search etc?
> >>
> >> I think that to reconstruct a standard query that achieves the results
> >> from a complex Streaming Expression is not simple. This is why I want to
> >> use the same query used to make analysis, to return full data via export
> >> handler.
> >>
> >> I am sorry if this is so much confusing.
> >>
> >> Thank you,
> >>
> >>
> >>
> >>
> >> 2017-11-24 12:36 GMT-02:00 Shawn Heisey <apa...@elyograg.org>:
> >>
> >>> On 11/23/2017 1:51 PM, Kojo wrote:
> >>>
> >>>> I am working on Solr to develop a toll to make analysis. I am using
> >>>> search
> >>>> functi

Re: docValues

2017-11-24 Thread Kojo
I Think that I found the solution. After analysis, change from /export
request handler to /select request handler in order to obtain other fields.
I will try that.



2017-11-24 15:15 GMT-02:00 Kojo <rbsnk...@gmail.com>:

> Thank you very much for your answer, Shawn.
>
> That is it, I was looking for another way to include fields non docValues
> to the filtered result documents.
> I can enable docValues to other fields and reindex all if necessary. I
> will tell you about the use case, because I am not sure  that I am on the
> right track.
>
> As I said before, I am using Streaming Expressions to deal with different
> collections. Up to this moment, it is decided that we will use this
> approach.
>
> The goal is to provide our users a web interface where they can make some
> queries. The backend will get Solr data using the Streaming Expressions
> rest api and will return rolled up data to the frontend, which will display
> some charts and aggregated data.
> After that, the end user may want to have data used to generate this
> aggregated information (not all fields of the filtered documents, but the
> fields used to aggregate information), combined with some other fields
> (title, description of document for example) which are not docValues. As
> you said I need to add docValues to then. My question is, isn´t it to
> expensive in terms of memory consumption to enable docValues on fields that
> I dont need to facet, search etc?
>
> I think that to reconstruct a standard query that achieves the results
> from a complex Streaming Expression is not simple. This is why I want to
> use the same query used to make analysis, to return full data via export
> handler.
>
> I am sorry if this is so much confusing.
>
> Thank you,
>
>
>
>
> 2017-11-24 12:36 GMT-02:00 Shawn Heisey <apa...@elyograg.org>:
>
>> On 11/23/2017 1:51 PM, Kojo wrote:
>>
>>> I am working on Solr to develop a toll to make analysis. I am using
>>> search
>>> function of Streaming Expressions, which requires a field to be indexed
>>> with docValues enabled, so I can get it.
>>>
>>> Suppose that after someone finishes the analysis, and would like to get
>>> other fields of the resultset that are not docValues enabled. How can it
>>> be
>>> done?
>>>
>>
>> We did get this message, but it's confusing as to exactly what you're
>> asking, which is why nobody responded.
>>
>> If you're saying that this theoretical person wants to use another field
>> with the streaming expression analysis you have provided, and that field
>> does not have docValues, then you'll need to add docValues to the field and
>> completely reindex.
>>
>> If you're asking something else, then you're going to need to provide
>> more details so we can actually know what you want to have happen.
>>
>> Thanks,
>> Shawn
>>
>
>


Re: docValues

2017-11-24 Thread Kojo
Thank you very much for your answer, Shawn.

That is it, I was looking for another way to include fields non docValues
to the filtered result documents.
I can enable docValues to other fields and reindex all if necessary. I will
tell you about the use case, because I am not sure  that I am on the right
track.

As I said before, I am using Streaming Expressions to deal with different
collections. Up to this moment, it is decided that we will use this
approach.

The goal is to provide our users a web interface where they can make some
queries. The backend will get Solr data using the Streaming Expressions
rest api and will return rolled up data to the frontend, which will display
some charts and aggregated data.
After that, the end user may want to have data used to generate this
aggregated information (not all fields of the filtered documents, but the
fields used to aggregate information), combined with some other fields
(title, description of document for example) which are not docValues. As
you said I need to add docValues to then. My question is, isn´t it to
expensive in terms of memory consumption to enable docValues on fields that
I dont need to facet, search etc?

I think that to reconstruct a standard query that achieves the results from
a complex Streaming Expression is not simple. This is why I want to use the
same query used to make analysis, to return full data via export handler.

I am sorry if this is so much confusing.

Thank you,




2017-11-24 12:36 GMT-02:00 Shawn Heisey <apa...@elyograg.org>:

> On 11/23/2017 1:51 PM, Kojo wrote:
>
>> I am working on Solr to develop a toll to make analysis. I am using search
>> function of Streaming Expressions, which requires a field to be indexed
>> with docValues enabled, so I can get it.
>>
>> Suppose that after someone finishes the analysis, and would like to get
>> other fields of the resultset that are not docValues enabled. How can it
>> be
>> done?
>>
>
> We did get this message, but it's confusing as to exactly what you're
> asking, which is why nobody responded.
>
> If you're saying that this theoretical person wants to use another field
> with the streaming expression analysis you have provided, and that field
> does not have docValues, then you'll need to add docValues to the field and
> completely reindex.
>
> If you're asking something else, then you're going to need to provide more
> details so we can actually know what you want to have happen.
>
> Thanks,
> Shawn
>


Fwd: docValues

2017-11-24 Thread Kojo
Hi,
yesterday I sent a message bellow to this list, but just after I sent the
message I received an e-mail from the mail server that said that my e-mail
bounced. I don´t know what that means, and since I receive no answer for
the question, I don´t know whether if the message has arrived  to the list
or not.
I appreciate your attention.

Thank you,




-- Forwarded message --
From: Kojo <rbsnk...@gmail.com>
Date: 2017-11-23 18:51 GMT-02:00
Subject: docValues
To: solr-user@lucene.apache.org


Hi,
I am working on Solr to develop a toll to make analysis. I am using search
function of Streaming Expressions, which requires a field to be indexed
with docValues enabled, so I can get it.

Suppose that after someone finishes the analysis, and would like to get
other fields of the resultset that are not docValues enabled. How can it be
done?

Thanks


docValues

2017-11-23 Thread Kojo
Hi,
I am working on Solr to develop a toll to make analysis. I am using search
function of Streaming Expressions, which requires a field to be indexed
with docValues enabled, so I can get it.

Suppose that after someone finishes the analysis, and would like to get
other fields of the resultset that are not docValues enabled. How can it be
done?

Thanks


Re: Streaming Expression usage

2017-11-08 Thread Kojo
We have a web site with traditional search capabilities, faceting, sorting
and so on. It has many problems before we took over and we need to
refectory it all. It has a single index for different type of documents,
and it has a very small amount of data.

The situation is that we are developing a PoC to expand the system in order
to have room for different type of government data, such as educational,
labor, financial and so on. For this expansion we thought that it would be
better to start thinking in a scalable system to accommodate multiple
collections on Solr Cloud. The user will be researchers analyzing data, but
I would like to offer a first layer of filtering to export data for them.
To develop this PoC, we are refactoring the system I´ve mentioned in the
begining. Because of our requirements to build a scalable system with
multiple indexes, hundreds of millions of documents, Solr cloud etc, when I
read about Streaming Expressions I thought it would fit as a good approach.

When you talk about efficiency, are you talking about the need to write
more code since the traditional Solr API may be less expensive, or are you
talking about efficiency in terms of performance?
Your opinion is very much appreciated.








2017-11-08 12:35 GMT-02:00 Joel Bernstein <joels...@gmail.com>:

> It would be useful if you could describe your use case more fully. For
> example are the users looking mainly for search results with facets? Or are
> they looking for more flexibility and data analysis capabilities.
>
> Streaming Expressions really lends itself to non-traditional search use
> cases. If your planning a traditional search interface then Streaming
> Expressions is going to be the less efficient approach.
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Wed, Nov 8, 2017 at 7:44 AM, Kojo <rbsnk...@gmail.com> wrote:
>
> > Amrit,
> > as far as I understand, in your example I have resulted documents
> > aggregated by the rollup function, but to get the documents themselves I
> > need to make another query that will get fq´s cached results, is that
> > correct?
> >
> >
> > And thanks for pointing about  fq in Streaming Expression. I was looking
> > for that but I haven´t found.
> >
> >
> >
> >
> >
> >
> > 2017-11-08 2:35 GMT-02:00 Amrit Sarkar <sarkaramr...@gmail.com>:
> >
> > > Kojo,
> > >
> > > Not sure what do you mean by making two request to get documents. A
> > > "search" streaming expression can be passed with "fq" parameter to
> filter
> > > the results and rollup on top of that will fetch you desired results.
> > This
> > > maybe not mentioned in official docs:
> > >
> > > Sample streaming expression:
> > >
> > > expr=rollup(
> > > >
> > > > search(collection1,
> > > >
> > > > zkHost="localhost:9983",
> > > >
> > > > qt="/export",
> > > >
> > > > q="*:*",
> > > >
> > > > fq=a_s:filter_a
> > > >
> > > > fl="id,a_s,a_i,a_f",
> > > >
> > > > sort="a_f asc"),
> > > >
> > > >over=a_f)
> > > >
> > > >
> > > Amrit Sarkar
> > > Search Engineer
> > > Lucidworks, Inc.
> > > 415-589-9269
> > > www.lucidworks.com
> > > Twitter http://twitter.com/lucidworks
> > > LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> > > Medium: https://medium.com/@sarkaramrit2
> > >
> > > On Wed, Nov 8, 2017 at 7:41 AM, Kojo <rbsnk...@gmail.com> wrote:
> > >
> > > > Hi,
> > > > I am working on PoC of a front-end web to provide an interface to the
> > end
> > > > user search and filter data on Solr indexes.
> > > >
> > > > I am trying Streaming Expression for about a week and I am fairly
> keen
> > > > about using it to search and filter indexes on Solr side. But I am
> not
> > > sure
> > > > whether this is the right approach or not.
> > > >
> > > > A simple question to illustrate my doubts: If use the search and some
> > > > Streaming Expressions more to get and filter the indexes to get
> > > documents,
> > > > and I want to rollup the result, will I have to make two requests? Is
> > > this
> > > > a good use for Streaming Expressions?
> > > >
> > >
> >
>


Re: Streaming Expression usage

2017-11-08 Thread Kojo
Amrit,
as far as I understand, in your example I have resulted documents
aggregated by the rollup function, but to get the documents themselves I
need to make another query that will get fq´s cached results, is that
correct?


And thanks for pointing about  fq in Streaming Expression. I was looking
for that but I haven´t found.






2017-11-08 2:35 GMT-02:00 Amrit Sarkar <sarkaramr...@gmail.com>:

> Kojo,
>
> Not sure what do you mean by making two request to get documents. A
> "search" streaming expression can be passed with "fq" parameter to filter
> the results and rollup on top of that will fetch you desired results. This
> maybe not mentioned in official docs:
>
> Sample streaming expression:
>
> expr=rollup(
> >
> > search(collection1,
> >
> > zkHost="localhost:9983",
> >
> > qt="/export",
> >
> > q="*:*",
> >
> > fq=a_s:filter_a
> >
> > fl="id,a_s,a_i,a_f",
> >
> > sort="a_f asc"),
> >
> >over=a_f)
> >
> >
> Amrit Sarkar
> Search Engineer
> Lucidworks, Inc.
> 415-589-9269
> www.lucidworks.com
> Twitter http://twitter.com/lucidworks
> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> Medium: https://medium.com/@sarkaramrit2
>
> On Wed, Nov 8, 2017 at 7:41 AM, Kojo <rbsnk...@gmail.com> wrote:
>
> > Hi,
> > I am working on PoC of a front-end web to provide an interface to the end
> > user search and filter data on Solr indexes.
> >
> > I am trying Streaming Expression for about a week and I am fairly keen
> > about using it to search and filter indexes on Solr side. But I am not
> sure
> > whether this is the right approach or not.
> >
> > A simple question to illustrate my doubts: If use the search and some
> > Streaming Expressions more to get and filter the indexes to get
> documents,
> > and I want to rollup the result, will I have to make two requests? Is
> this
> > a good use for Streaming Expressions?
> >
>


Streaming Expression usage

2017-11-07 Thread Kojo
Hi,
I am working on PoC of a front-end web to provide an interface to the end
user search and filter data on Solr indexes.

I am trying Streaming Expression for about a week and I am fairly keen
about using it to search and filter indexes on Solr side. But I am not sure
whether this is the right approach or not.

A simple question to illustrate my doubts: If use the search and some
Streaming Expressions more to get and filter the indexes to get documents,
and I want to rollup the result, will I have to make two requests? Is this
a good use for Streaming Expressions?


Re: App Studio

2017-11-01 Thread Kojo
I would like to try that!


Em 1 de nov de 2017 18:04, "Will Hayes"  escreveu:

There is a community edition of App Studio for Solr and Elasticsearch being
released by Lucidworks in November. Drop me a line if you would like to get
a preview release.
-wh

--
Will Hayes | CEO | Lucidworks
direct. +1.415.997.9455 | email. w...@lucidworks.com

On Wed, Nov 1, 2017 at 12:54 PM, David Hastings <
hastings.recurs...@gmail.com> wrote:

> Hey all, at the conference it was mentioned that lucidworks would release
> app studio as its own and free project.  is that still the case?
>


Re: Streaming Expression - cartesianProduct

2017-11-01 Thread Kojo
Pratik's information  answered the question.

Thanks!



Em 1 de nov de 2017 19:45, "Amrit Sarkar" <sarkaramr...@gmail.com> escreveu:

Following Pratik's spot-on comment and not really related to your question,

Even the "partitionKeys" parameter needs to be specified the "over" field
while using "parallel" streaming.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2

On Thu, Nov 2, 2017 at 2:38 AM, Pratik Patel <pra...@semandex.net> wrote:

> Roll up needs documents to be sorted by the "over" field.
> Check this for more details
> http://lucene.472066.n3.nabble.com/Streaming-Expressions-rollup-function-
> returning-results-with-duplicate-tuples-td4342398.html
>
> On Wed, Nov 1, 2017 at 3:41 PM, Kojo <rbsnk...@gmail.com> wrote:
>
> > Wrap cartesianProduct function with fetch function works as expected.
> >
> > But rollup function over cartesianProduct doesn´t aggregate on a
returned
> > field of the cartesianProduct.
> >
> >
> > The field "id_researcher" bellow is a Multivalued field:
> >
> >
> >
> > This one works:
> >
> >
> > fetch(reasercher,
> >
> > cartesianProduct(
> > having(
> > cartesianProduct(
> > search(schoolarship,zkHost="localhost:9983",qt="/export",
> > q="*:*",
> > fl="process, area, id_reasercher",sort="process asc"),
> > area
> > ),
> > eq(area, val(Anything))),
> > id_reasercher),
> > fl="name, django_id",
> > on="id_reasercher=django_id"
> > )
> >
> >
> > This one doesn´t works:
> >
> > rollup(
> >
> > cartesianProduct(
> > having(
> > cartesianProduct(
> > search(schoolarship,zkHost="localhost:9983",qt="/export",
> > q="*:*",
> > fl="process, area, id_researcher, status",sort="process asc"),
> > area
> > ),
> > eq(area, val(Anything))),
> > id_researcher),
> > over=id_researcher,count(*)
> > )
> >
> > If I aggregate over a non MultiValued field, it works.
> >
> >
> > Is that correct, rollup doesn´t work on a cartesianProduct?
> >
>


Streaming Expression - cartesianProduct

2017-11-01 Thread Kojo
Wrap cartesianProduct function with fetch function works as expected.

But rollup function over cartesianProduct doesn´t aggregate on a returned
field of the cartesianProduct.


The field "id_researcher" bellow is a Multivalued field:



This one works:


fetch(reasercher,

cartesianProduct(
having(
cartesianProduct(
search(schoolarship,zkHost="localhost:9983",qt="/export",q="*:*",
fl="process, area, id_reasercher",sort="process asc"),
area
),
eq(area, val(Anything))),
id_reasercher),
fl="name, django_id",
on="id_reasercher=django_id"
)


This one doesn´t works:

rollup(

cartesianProduct(
having(
cartesianProduct(
search(schoolarship,zkHost="localhost:9983",qt="/export",q="*:*",
fl="process, area, id_researcher, status",sort="process asc"),
area
),
eq(area, val(Anything))),
id_researcher),
over=id_researcher,count(*)
)

If I aggregate over a non MultiValued field, it works.


Is that correct, rollup doesn´t work on a cartesianProduct?


Re: Graph Traversal

2017-10-31 Thread Kojo
Everything working fine, these functional programming is amazing.
Thank you!

2017-10-31 12:31 GMT-02:00 Kojo <rbsnk...@gmail.com>:

> Thank you, I am just starting with Streaming Expressions. I will try this
> one later.
>
> I will open another thread, because I can´t do some simple queries using
> Streaming Expressions.
>
>
>
>
> 2017-10-30 12:11 GMT-02:00 Pratik Patel <pra...@semandex.net>:
>
>> You use this in query time. Since Streaming Expressions can be pipelined,
>> the next stage/function of pipeline will work on the new tuples generated.
>>
>> On Mon, Oct 30, 2017 at 10:09 AM, Kojo <rbsnk...@gmail.com> wrote:
>>
>> > Do you store this new tuples, created by Streaming Expressions, in a new
>> > Solr cloud collection? Or just use this tuples in query time?
>> >
>> > 2017-10-30 11:00 GMT-02:00 Pratik Patel <pra...@semandex.net>:
>> >
>> > > By including Cartesian function in Streaming Expression pipeline, you
>> can
>> > > convert a tuple having one multivalued field into multiple tuples
>> where
>> > > each tuple holds one value for the field which was originally
>> > multivalued.
>> > >
>> > > For example, if you have following document.
>> > >
>> > > { id: someID, fruits: [apple, organge, banana] }   // fruits is
>> > multivalued
>> > > > field
>> > >
>> > >
>> > > Applying Cartesian function would give following tuples.
>> > >
>> > > { id: someID , fruits: apple }, { id: someID, fruits: orange }, {id:
>> > > > someID, fruits: banana }
>> > >
>> > >
>> > > Now that fruits holds single values, you can also use any Streaming
>> > > Expression functions which don't work with multivalued fields. This
>> > happens
>> > > in the Streaming Expression pipeline so you don't have to flatten your
>> > > documents in index.
>> > >
>> > > On Mon, Oct 30, 2017 at 8:39 AM, Kojo <rbsnk...@gmail.com> wrote:
>> > >
>> > > > Hi,
>> > > > just a question, I have no deep background on Solr, Graph etc.
>> > > > This solution looks like normalizing data like a m2m table in sql
>> > > database,
>> > > > is it?
>> > > >
>> > > >
>> > > >
>> > > > 2017-10-29 21:51 GMT-02:00 Pratik Patel <pra...@semandex.net>:
>> > > >
>> > > > > For now, you can probably use Cartesian function of Streaming
>> > > Expressions
>> > > > > which Joel implemented to solve the same problem.
>> > > > >
>> > > > > https://issues.apache.org/jira/browse/SOLR-10292
>> > > > > http://joelsolr.blogspot.com/2017/03/streaming-nlp-is-
>> > > > > coming-in-solr-66.html
>> > > > >
>> > > > > Regards,
>> > > > > Pratik
>> > > > >
>> > > > > On Sat, Oct 28, 2017 at 7:38 PM, Joel Bernstein <
>> joels...@gmail.com>
>> > > > > wrote:
>> > > > >
>> > > > > > I don't see a jira ticket for this yet. Feel free to create it
>> and
>> > > > reply
>> > > > > > back with the link.
>> > > > > >
>> > > > > > Joel Bernstein
>> > > > > > http://joelsolr.blogspot.com/
>> > > > > >
>> > > > > > On Fri, Oct 27, 2017 at 9:55 AM, Kojo <rbsnk...@gmail.com>
>> wrote:
>> > > > > >
>> > > > > > > Hi, I was looking for information on Graph Traversal. More
>> > > > > specifically,
>> > > > > > > support to search graph on multivalued field.
>> > > > > > >
>> > > > > > > Searching on the Internet, I found a question exactly the
>> same of
>> > > > mine,
>> > > > > > > with an answer that what I need is not implemented yet:
>> > > > > > > http://lucene.472066.n3.nabble.com/Using-multi-valued-
>> > > > > > > field-in-solr-cloud-Graph-Traversal-Query-td4324379.html
>> > > > > > >
>> > > > > > >
>> > > > > > > Is there a ticket on Jira to follow the implementation of
>> search
>> > > > graph
>> > > > > on
>> > > > > > > multivalued field?
>> > > > > > >
>> > > > > > > Thank you,
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>
>


Re: Graph Traversal

2017-10-31 Thread Kojo
Thank you, I am just starting with Streaming Expressions. I will try this
one later.

I will open another thread, because I can´t do some simple queries using
Streaming Expressions.




2017-10-30 12:11 GMT-02:00 Pratik Patel <pra...@semandex.net>:

> You use this in query time. Since Streaming Expressions can be pipelined,
> the next stage/function of pipeline will work on the new tuples generated.
>
> On Mon, Oct 30, 2017 at 10:09 AM, Kojo <rbsnk...@gmail.com> wrote:
>
> > Do you store this new tuples, created by Streaming Expressions, in a new
> > Solr cloud collection? Or just use this tuples in query time?
> >
> > 2017-10-30 11:00 GMT-02:00 Pratik Patel <pra...@semandex.net>:
> >
> > > By including Cartesian function in Streaming Expression pipeline, you
> can
> > > convert a tuple having one multivalued field into multiple tuples where
> > > each tuple holds one value for the field which was originally
> > multivalued.
> > >
> > > For example, if you have following document.
> > >
> > > { id: someID, fruits: [apple, organge, banana] }   // fruits is
> > multivalued
> > > > field
> > >
> > >
> > > Applying Cartesian function would give following tuples.
> > >
> > > { id: someID , fruits: apple }, { id: someID, fruits: orange }, {id:
> > > > someID, fruits: banana }
> > >
> > >
> > > Now that fruits holds single values, you can also use any Streaming
> > > Expression functions which don't work with multivalued fields. This
> > happens
> > > in the Streaming Expression pipeline so you don't have to flatten your
> > > documents in index.
> > >
> > > On Mon, Oct 30, 2017 at 8:39 AM, Kojo <rbsnk...@gmail.com> wrote:
> > >
> > > > Hi,
> > > > just a question, I have no deep background on Solr, Graph etc.
> > > > This solution looks like normalizing data like a m2m table in sql
> > > database,
> > > > is it?
> > > >
> > > >
> > > >
> > > > 2017-10-29 21:51 GMT-02:00 Pratik Patel <pra...@semandex.net>:
> > > >
> > > > > For now, you can probably use Cartesian function of Streaming
> > > Expressions
> > > > > which Joel implemented to solve the same problem.
> > > > >
> > > > > https://issues.apache.org/jira/browse/SOLR-10292
> > > > > http://joelsolr.blogspot.com/2017/03/streaming-nlp-is-
> > > > > coming-in-solr-66.html
> > > > >
> > > > > Regards,
> > > > > Pratik
> > > > >
> > > > > On Sat, Oct 28, 2017 at 7:38 PM, Joel Bernstein <
> joels...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > I don't see a jira ticket for this yet. Feel free to create it
> and
> > > > reply
> > > > > > back with the link.
> > > > > >
> > > > > > Joel Bernstein
> > > > > > http://joelsolr.blogspot.com/
> > > > > >
> > > > > > On Fri, Oct 27, 2017 at 9:55 AM, Kojo <rbsnk...@gmail.com>
> wrote:
> > > > > >
> > > > > > > Hi, I was looking for information on Graph Traversal. More
> > > > > specifically,
> > > > > > > support to search graph on multivalued field.
> > > > > > >
> > > > > > > Searching on the Internet, I found a question exactly the same
> of
> > > > mine,
> > > > > > > with an answer that what I need is not implemented yet:
> > > > > > > http://lucene.472066.n3.nabble.com/Using-multi-valued-
> > > > > > > field-in-solr-cloud-Graph-Traversal-Query-td4324379.html
> > > > > > >
> > > > > > >
> > > > > > > Is there a ticket on Jira to follow the implementation of
> search
> > > > graph
> > > > > on
> > > > > > > multivalued field?
> > > > > > >
> > > > > > > Thank you,
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>


Re: Graph Traversal

2017-10-30 Thread Kojo
Do you store this new tuples, created by Streaming Expressions, in a new
Solr cloud collection? Or just use this tuples in query time?

2017-10-30 11:00 GMT-02:00 Pratik Patel <pra...@semandex.net>:

> By including Cartesian function in Streaming Expression pipeline, you can
> convert a tuple having one multivalued field into multiple tuples where
> each tuple holds one value for the field which was originally multivalued.
>
> For example, if you have following document.
>
> { id: someID, fruits: [apple, organge, banana] }   // fruits is multivalued
> > field
>
>
> Applying Cartesian function would give following tuples.
>
> { id: someID , fruits: apple }, { id: someID, fruits: orange }, {id:
> > someID, fruits: banana }
>
>
> Now that fruits holds single values, you can also use any Streaming
> Expression functions which don't work with multivalued fields. This happens
> in the Streaming Expression pipeline so you don't have to flatten your
> documents in index.
>
> On Mon, Oct 30, 2017 at 8:39 AM, Kojo <rbsnk...@gmail.com> wrote:
>
> > Hi,
> > just a question, I have no deep background on Solr, Graph etc.
> > This solution looks like normalizing data like a m2m table in sql
> database,
> > is it?
> >
> >
> >
> > 2017-10-29 21:51 GMT-02:00 Pratik Patel <pra...@semandex.net>:
> >
> > > For now, you can probably use Cartesian function of Streaming
> Expressions
> > > which Joel implemented to solve the same problem.
> > >
> > > https://issues.apache.org/jira/browse/SOLR-10292
> > > http://joelsolr.blogspot.com/2017/03/streaming-nlp-is-
> > > coming-in-solr-66.html
> > >
> > > Regards,
> > > Pratik
> > >
> > > On Sat, Oct 28, 2017 at 7:38 PM, Joel Bernstein <joels...@gmail.com>
> > > wrote:
> > >
> > > > I don't see a jira ticket for this yet. Feel free to create it and
> > reply
> > > > back with the link.
> > > >
> > > > Joel Bernstein
> > > > http://joelsolr.blogspot.com/
> > > >
> > > > On Fri, Oct 27, 2017 at 9:55 AM, Kojo <rbsnk...@gmail.com> wrote:
> > > >
> > > > > Hi, I was looking for information on Graph Traversal. More
> > > specifically,
> > > > > support to search graph on multivalued field.
> > > > >
> > > > > Searching on the Internet, I found a question exactly the same of
> > mine,
> > > > > with an answer that what I need is not implemented yet:
> > > > > http://lucene.472066.n3.nabble.com/Using-multi-valued-
> > > > > field-in-solr-cloud-Graph-Traversal-Query-td4324379.html
> > > > >
> > > > >
> > > > > Is there a ticket on Jira to follow the implementation of search
> > graph
> > > on
> > > > > multivalued field?
> > > > >
> > > > > Thank you,
> > > > >
> > > >
> > >
> >
>


Re: Graph Traversal

2017-10-30 Thread Kojo
Hi,
just a question, I have no deep background on Solr, Graph etc.
This solution looks like normalizing data like a m2m table in sql database,
is it?



2017-10-29 21:51 GMT-02:00 Pratik Patel <pra...@semandex.net>:

> For now, you can probably use Cartesian function of Streaming Expressions
> which Joel implemented to solve the same problem.
>
> https://issues.apache.org/jira/browse/SOLR-10292
> http://joelsolr.blogspot.com/2017/03/streaming-nlp-is-
> coming-in-solr-66.html
>
> Regards,
> Pratik
>
> On Sat, Oct 28, 2017 at 7:38 PM, Joel Bernstein <joels...@gmail.com>
> wrote:
>
> > I don't see a jira ticket for this yet. Feel free to create it and reply
> > back with the link.
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> > On Fri, Oct 27, 2017 at 9:55 AM, Kojo <rbsnk...@gmail.com> wrote:
> >
> > > Hi, I was looking for information on Graph Traversal. More
> specifically,
> > > support to search graph on multivalued field.
> > >
> > > Searching on the Internet, I found a question exactly the same of mine,
> > > with an answer that what I need is not implemented yet:
> > > http://lucene.472066.n3.nabble.com/Using-multi-valued-
> > > field-in-solr-cloud-Graph-Traversal-Query-td4324379.html
> > >
> > >
> > > Is there a ticket on Jira to follow the implementation of search graph
> on
> > > multivalued field?
> > >
> > > Thank you,
> > >
> >
>


Solr - cross collection query

2017-10-27 Thread Kojo
I am digging into graph traversal. I want to make queries crossing
collections.

Using gatherNodes function on Streaming Expressions, it is possible as the
example bellow

gatherNodes(logs,
gatherNodes(emails,
search(emails, q="body:(solr rocks)",
fl="from", sort="score desc", rows="20")
walk="from->from",
gather="to",
scatter="leaves, branches"),
walk="node->user",
fq="action:edit",
gather="contentID")


However, as far as I understand, gatherNode is a function to make forward
query, to get nodes that the root node points to.

On the other hand, on standard queries it is possible to make reverse
queries, to get nodes that points to the root node.



The question is, is it possible to make a standard query between
collections?

The example bellow demonstrates how to concatenate graph queries, but is it
possible to do it on different collections?

*+{!graph from=”folder_id” to “folder_id”
traversalFilter=”table:document”}({!graph from=”folder_id” to=“parent_id”
traversalFilter=”table:folder”}folder_id:123) +bar*


Graph Traversal

2017-10-27 Thread Kojo
Hi, I was looking for information on Graph Traversal. More specifically,
support to search graph on multivalued field.

Searching on the Internet, I found a question exactly the same of mine,
with an answer that what I need is not implemented yet:
http://lucene.472066.n3.nabble.com/Using-multi-valued-field-in-solr-cloud-Graph-Traversal-Query-td4324379.html


Is there a ticket on Jira to follow the implementation of search graph on
multivalued field?

Thank you,