Re: Combining complex joins with other criteria

2017-11-23 Thread Mikhail Khludnev
This is my pet peeve in Solr.
When q={!join ..}all tokens goes into join qp
but even if we put space in front of curly brace q= {!join
..}1st_only_token_goes_to_join_qp
I remember there were some discussion in comments, but I believe it is not
raised ever raised as a defect.
We have some json query syntax introduced recently, it might work more
predictable.

On Fri, Nov 24, 2017 at 1:08 AM, David Frese 
wrote:

> Am 23.11.17 um 20:13 schrieb Mikhail Khludnev:
>
>> Hello, David.
>> It should be like
>> q=name:"Mike" AND {!join from=pid to=id v=$qq}=city:"London" AND
>> id:"a1"
>>
>>
> Thanks a lot!
>
> But this looks like working around a bug, given that
>
> >> {!join from=pid to=id}(city:"London" AND id:"a1")
>
> works fine. Is there any documentation about this? Or an open issue?
>
> --
> David Frese
> +49 7071 70896 75
>
> Active Group GmbH
> Hechinger Str. 12/1, 72072 Tübingen
> Registergericht: Amtsgericht Stuttgart, HRB 224404
> Geschäftsführer: Dr. Michael Sperber
>



-- 
Sincerely yours
Mikhail Khludnev


Solr7 org.apache.lucene.index.IndexUpgrader

2017-11-23 Thread Leo Prince
Hi,

We were using bit older version Solr 4.10.2 and upgrading to Solr7.

We have like 4mil records in one of the core which is of course pretty
huge, hence re-sourcing the index is nearly impossible and re-querying from
source Solr to Solr7 is also going to be an exhausting effort.

Hence, I tried to upgrade the Index using
org.apache.lucene.index.IndexUpgrader. like first upgraded to 5 and then to
6 and then to 7. IndexUpgrade went fine without issue, however having error
with initializing the core.

Index Upgrade
===
java -cp lucene-core-5.5.4.jar:lucene-backward-codecs-5.5.4.jar
org.apache.lucene.index.IndexUpgrader -delete-prior-commits
/var/solr/data/report_shard1_replica_n1/data/index/
java -cp lucene-core-6.6.0.jar:lucene-backward-codecs-6.6.0.jar
org.apache.lucene.index.IndexUpgrader -delete-prior-commits
/var/solr/data/report_shard1_replica_n1/data/index/
java -cp lucene-core-7.1.0.jar:lucene-backward-codecs-7.1.0.jar
org.apache.lucene.index.IndexUpgrader -delete-prior-commits
/var/solr/data/report_shard1_replica_n1/data/index/

IndexUpgrader ran just fine without any errors. but got this error with
initializing the core.



*java.lang.IllegalStateException:java.lang.IllegalStateException:
unexpected docvalues type NONE for field '_version_' (expected=NUMERIC).
Re-index with correct docvalues type.*
Being said, I am using Classic Schema and used default managed-schema file
as classic schema.xml.

*./server/solr/configsets/_default/conf/managed-schema*

When comparing schema of 4.10.2 with that of 7.1.0, I see the field type
names have changed like follows




**

Earlier until Solr6, it was int, float, long and double (*with out P at the
beginning*). I read in docs, old field type names are deprecated in Solr7
and have to use everything starting with "*P*" which enhances the
performances. Hence in this context,

1, The error I got
*java.lang.IllegalStateException:java.lang.IllegalStateException,
*Is it because my index data synced and upgraded contains old field type
names and new Solr7 schema contains new field type names..? Being
said, my IndexUpgrade
completed without any errors.

2, How to sort out the error in 1, if my assessment correct.? Since my data
is too large such that it's hard to re-source or re-query, is there any
other work arounds to migrate the index if IndexUpgrade is not an option to
upgrade index to 7.

Please advise and thanks in advance.

Leo Prince.


Re: TimeZone issue

2017-11-23 Thread Renuka Srishti
Yes, we have TZ parameter for that which worked only for date math. I need
to convert the date time zone on the client side.

Thanks
Renuka Srishti

On Thu, Nov 16, 2017 at 8:51 PM, Shawn Heisey  wrote:

> On 11/16/2017 4:54 AM, Renuka Srishti wrote:
>
>> Thanks for your response Shawn. I know it deals with UTC only, but it will
>> be great if we can change the date timeZone in solr response. As I am
>> using
>> Solr CSV feature and it will be helpful if the date field in the CSV
>> result
>> can convert into client TimeZone. Please suggest if you have any alternate
>> for this.
>>
>
> As I said before, I do not think that Solr will use timezones for date
> display -- ever.  Solr does support timezones in certain circumstances, but
> I'm pretty sure that it is *only* to correctly support date math -- so Solr
> knows what time each day starts for date rounding like NOW/DAY and
> NOW/WEEK.  I have never heard of any feature that applies timezones to date
> information in responses.
>
> Timezone conversion of dates in Solr responses is something you need to do
> in the client, and should be trivial for most web development programming
> languages.
>
> Thanks,
> Shawn
>
>


Re: TimeZone issue

2017-11-23 Thread Renuka Srishti
Hii Rick,
All clients are in different time zone. So I was searching for some support
which can convert the date fields in the query response in given time zone.

Thanks
Renuka Srishti

On Thu, Nov 16, 2017 at 5:58 PM, Rick Leir  wrote:

> Renuka
> Are your clients all in the same time zone? Solr should support clients in
> several timezones, and UTC conversion to local is best done in the client
> in my mind. Thanks -- Rick
>
> On November 16, 2017 6:54:47 AM EST, Renuka Srishti <
> renuka.srisht...@gmail.com> wrote:
> >Thanks for your response Shawn. I know it deals with UTC only, but it
> >will
> >be great if we can change the date timeZone in solr response. As I am
> >using
> >Solr CSV feature and it will be helpful if the date field in the CSV
> >result
> >can convert into client TimeZone. Please suggest if you have any
> >alternate
> >for this.
> >
> >Thanks
> >Renuka Srishti
> >
> >On Wed, Nov 15, 2017 at 6:16 PM, Shawn Heisey 
> >wrote:
> >
> >> On 11/15/2017 5:34 AM, Renuka Srishti wrote:
> >>
> >>> I am working on CSV export using Apache Solr. I have written all the
> >>> required query and set wt as CSV. I am getting my results as I
> >want,but
> >>> the
> >>> problem is TimeZone.
> >>>
> >>> Solr stores date value in UTC, but my client timeZone is different.
> >Is
> >>> there any way to convert date timeZone from UTC to clientTimeZone
> >direclty
> >>> in the Solr response?
> >>>
> >>
> >> Not that I know of.  UTC is the only storage/transfer method that
> >works in
> >> all situations.  Converting dates to the local timezone is a task for
> >the
> >> client, when it displays the date to a user.
> >>
> >> Typically, you would consume the response from Solr into object types
> >for
> >> the language your application is written in.  A date value in the
> >response
> >> should end up in a date object.  Date objects in most programming
> >languages
> >> have the ability to display in specific timezones.
> >>
> >> Thanks,
> >> Shawn
> >>
> >>
>
> --
> Sorry for being brief. Alternate email is rickleir at yahoo dot com


Re: OutOfMemoryError in 6.5.1

2017-11-23 Thread Damien Kamerman
I found the suggesters very memory hungry. I had one particularly large
index where the suggester should have been filtering a small number of
docs, but was mmap'ing the entire index. I only ever saw this behavior with
the suggesters.

On 22 November 2017 at 03:17, Walter Underwood 
wrote:

> All our customizations are in solr.in.sh. We’re using the one we
> configured for 6.3.0. I’ll check for any differences between that and the
> 6.5.1 script.
>
> I don’t see any arguments at all in the dashboard. I do see them in a ps
> listing, right at the end.
>
> java -server -Xms8g -Xmx8g -XX:+UseG1GC -XX:+ParallelRefProcEnabled
> -XX:G1HeapRegionSize=8m -XX:MaxGCPauseMillis=200 -XX:+UseLargePages
> -XX:+AggressiveOpts -XX:+HeapDumpOnOutOfMemoryError -verbose:gc
> -XX:+PrintHeapAtGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps
> -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution 
> -XX:+PrintGCApplicationStoppedTime
> -Xloggc:/solr/logs/solr_gc.log -XX:+UseGCLogFileRotation
> -XX:NumberOfGCLogFiles=9 -XX:GCLogFileSize=20M
> -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.local.only=false
> -Dcom.sun.management.jmxremote.ssl=false 
> -Dcom.sun.management.jmxremote.authenticate=false
> -Dcom.sun.management.jmxremote.port=18983 
> -Dcom.sun.management.jmxremote.rmi.port=18983
> -Djava.rmi.server.hostname=new-solr-c01.test3.cloud.cheggnet.com
> -DzkClientTimeout=15000 -DzkHost=zookeeper1.test3.cloud.cheggnet.com:2181,
> zookeeper2.test3.cloud.cheggnet.com:2181,zookeeper3.test3.cloud.
> cheggnet.com:2181/solr-cloud -Dsolr.log.level=WARN
> -Dsolr.log.dir=/solr/logs -Djetty.port=8983 -DSTOP.PORT=7983
> -DSTOP.KEY=solrrocks -Dhost=new-solr-c01.test3.cloud.cheggnet.com
> -Duser.timezone=UTC -Djetty.home=/apps/solr6/server
> -Dsolr.solr.home=/apps/solr6/server/solr -Dsolr.install.dir=/apps/solr6
> -Dgraphite.prefix=solr-cloud.new-solr-c01 -Dgraphite.host=influx.test.
> cheggnet.com -javaagent:/apps/solr6/newrelic/newrelic.jar
> -Dnewrelic.environment=test3 -Dsolr.log.muteconsole -Xss256k
> -Dsolr.log.muteconsole -XX:OnOutOfMemoryError=/apps/solr6/bin/oom_solr.sh
> 8983 /solr/logs -jar start.jar --module=http
>
> I’m still confused why we are hitting OOM in 6.5.1 but weren’t in 6.3.0.
> Our load benchmarks use prod logs. We added suggesters, but those use
> analyzing infix, so they are search indexes, not in-memory.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>
> > On Nov 21, 2017, at 5:46 AM, Shawn Heisey  wrote:
> >
> > On 11/20/2017 6:17 PM, Walter Underwood wrote:
> >> When I ran load benchmarks with 6.3.0, an overloaded cluster would get
> super slow but keep functioning. With 6.5.1, we hit 100% CPU, then start
> getting OOMs. That is really bad, because it means we need to reboot every
> node in the cluster.
> >> Also, the JVM OOM hook isn’t running the process killer (JVM
> 1.8.0_121-b13). Using the G1 collector with the Shawn Heisey settings in an
> 8G heap.
> > 
> >> This is not good behavior in prod. The process goes to the bad place,
> then we need to wait until someone is paged and kills it manually. Luckily,
> it usually drops out of the live nodes for each collection and doesn’t take
> user traffic.
> >
> > There was a bug, fixed long before 6.3.0, where the OOM killer script
> wasn't working because the arguments enabling it were in the wrong place.
> It was fixed in 5.5.1 and 6.0.
> >
> > https://issues.apache.org/jira/browse/SOLR-8145
> >
> > If the scripts that you are using to get Solr started originated with a
> much older version of Solr than you are currently running, maybe you've got
> the arguments in the wrong order.
> >
> > Do you see the commandline arguments for the OOM killer (only available
> on *NIX systems, not Windows) on the admin UI dashboard?  If they are
> properly placed, you will see them on the dashboard, but if they aren't
> properly placed, then you won't see them.  This is what the argument looks
> like for one of my Solr installs:
> >
> > -XX:OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh 8983 /var/solr/logs
> >
> > Something which you probably already know:  If you're hitting OOM, you
> need a larger heap, or you need to adjust the config so it uses less
> memory.  There are no other ways to "fix" OOM problems.
> >
> > Thanks,
> > Shawn
>
>


Re: Combining complex joins with other criteria

2017-11-23 Thread David Frese

Am 23.11.17 um 20:13 schrieb Mikhail Khludnev:

Hello, David.
It should be like
q=name:"Mike" AND {!join from=pid to=id v=$qq}=city:"London" AND id:"a1"



Thanks a lot!

But this looks like working around a bug, given that

>> {!join from=pid to=id}(city:"London" AND id:"a1")

works fine. Is there any documentation about this? Or an open issue?

--
David Frese
+49 7071 70896 75

Active Group GmbH
Hechinger Str. 12/1, 72072 Tübingen
Registergericht: Amtsgericht Stuttgart, HRB 224404
Geschäftsführer: Dr. Michael Sperber


Re: Please help me with solr plugin

2017-11-23 Thread Alexandre Rafalovitch
Haven't done it myself, but maybe these could be useful:

https://github.com/DiceTechJobs/SolrPlugins
https://github.com/leonardofoderaro/alba

Regards,
   Alex.

http://www.solr-start.com/ - Resources for Solr users, new and experienced


On 21 November 2017 at 05:22, Zara Parst  wrote:
> Hi,
>
> I have spent too much time learning plugin for Solr. I am about give up. If
> some one has experience writing it. Please contact me. I am open to all
> options. I want to learn it at any cost.
>
> Thanks
> Zara


docValues

2017-11-23 Thread Kojo
Hi,
I am working on Solr to develop a toll to make analysis. I am using search
function of Streaming Expressions, which requires a field to be indexed
with docValues enabled, so I can get it.

Suppose that after someone finishes the analysis, and would like to get
other fields of the resultset that are not docValues enabled. How can it be
done?

Thanks


Re: Solr7: Very High number of threads on aggregator node

2017-11-23 Thread Rick Leir
Nawab
What do you see in the log file?

If nothing else is solving the problem, then get a sample V7 solrcinfig.xml and 
use it, modified to suit your needs.
Cheers -- Rick

On November 22, 2017 11:38:13 AM EST, Nawab Zada Asad Iqbal  
wrote:
>Rick
>
>Your suspicion is correct. I mostly reused my config from solr4 except
>where it was deprecated or obsoleted and I switched to the newer
>configs:
>Having said that I couldn't find any new query related settings which
>can
>impact us, since most of our queries dont use fancy new features.
>
>I couldn't find a decent way to copy long xml here, so i created this
>stackoverflow thread:-
>
>https://stackoverflow.com/questions/47439503/solr-7-0-1-aggregator-node-spinning-many-threads
>
>
>Thanks!
>Nawab
>
>
>On Mon, Nov 20, 2017 at 3:10 PM, Rick Leir  wrote:
>
>> Nawab
>> Why it would be good to share the solrconfigs: I had a suspicion that
>you
>> might be using the same solrconfig for version 7 and 4.5. That is
>unlikely
>> to work well. But I could be way off base.
>> Rick
>> --
>> Sorry for being brief. Alternate email is rickleir at yahoo dot com
>>

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com 

Re: Combining complex joins with other criteria

2017-11-23 Thread Mikhail Khludnev
Hello, David.
It should be like
q=name:"Mike" AND {!join from=pid to=id v=$qq}=city:"London" AND id:"a1"

On Thu, Nov 23, 2017 at 10:06 PM, David Frese 
wrote:

> Hi there everbody,
>
> I want to combine joins with multiple criteria and other criteria, with
> variying boolean operators, but I keep getting
> "org.apache.solr.search.SyntaxError: Cannot parse" errors. I tried with
> an installation of version 6.3, and also with the embedded server in
> version 7.1
>
> Given these two documents:
>
> {"id": "p1", "name": "Mike"}
> {"id": "a1", "pid": "p1", "city": "London"}
>
> thw following queries work just fine:
>
> {!join from=pid to=id}(city:"London" AND id:"a1")
> name:"Mike" AND id:"p1"
> name:"Mike" AND {!join from=pid to=id}city:"London"
>
> but when I start trying multiple criteria to the join, I keep getting
> parse errors with a message like "Cannot parse '(city:"London"':
> Encountered "" at line 1, column 39."
> Seems something is eating up (or not eating) the braces?
>
> I tried all the following variants already, and all fail:
>
> name:"Mike" AND {!join from=pid to=id}(city:"London" AND id:"a1")
>
> name:"Mike" AND ({!join from=pid to=id}(city:"London" AND id:"a1"))
>
> (name:"Mike") AND ({!join from=pid to=id}(city:"London" AND id:"a1"))
>
> (name:"Mike" AND {!join from=pid to=id}(city:"London" AND id:"a1"))
>
> Some other variants do parse, but are somehow misunderstood and do not
> yield any results, like:
>
> name:"Mike" AND ({!join from=pid to=id}city:"London" AND id:"a1")
>
>
>
> Thanks for any help!
>
> --
> David Frese
>



-- 
Sincerely yours
Mikhail Khludnev


Combining complex joins with other criteria

2017-11-23 Thread David Frese

Hi there everbody,

I want to combine joins with multiple criteria and other criteria, with 
variying boolean operators, but I keep getting 
"org.apache.solr.search.SyntaxError: Cannot parse" errors. I tried with 
an installation of version 6.3, and also with the embedded server in 
version 7.1


Given these two documents:

{"id": "p1", "name": "Mike"}
{"id": "a1", "pid": "p1", "city": "London"}

thw following queries work just fine:

{!join from=pid to=id}(city:"London" AND id:"a1")
name:"Mike" AND id:"p1"
name:"Mike" AND {!join from=pid to=id}city:"London"

but when I start trying multiple criteria to the join, I keep getting 
parse errors with a message like "Cannot parse '(city:"London"': 
Encountered "" at line 1, column 39."

Seems something is eating up (or not eating) the braces?

I tried all the following variants already, and all fail:

name:"Mike" AND {!join from=pid to=id}(city:"London" AND id:"a1")

name:"Mike" AND ({!join from=pid to=id}(city:"London" AND id:"a1"))

(name:"Mike") AND ({!join from=pid to=id}(city:"London" AND id:"a1"))

(name:"Mike" AND {!join from=pid to=id}(city:"London" AND id:"a1"))

Some other variants do parse, but are somehow misunderstood and do not 
yield any results, like:


name:"Mike" AND ({!join from=pid to=id}city:"London" AND id:"a1")



Thanks for any help!

--
David Frese


Spellchecker Results

2017-11-23 Thread Sadiki Latty
Hi all,

Is it possible to return the results of a spellcheck in addition to the 
spellcheck WITHOUT sending another query request?
Example:
Client sends "educatione", results  returns education results as well as noting 
that the term "educatione" was spellchecked.


Thanks

Sid Latty


Re: Strip out punctuation at the end of token

2017-11-23 Thread Shawn Heisey

On 11/23/2017 8:06 AM, marotosg wrote:

I am trying to strip out any "."  at the end of a token but I would like to
keep the original token as well.
This is my index analyzer

  
   
   
   


i was thinking of using the solr.PatternReplaceFilterFactory but i see this
one won't keep the original token.


The WordDelimiterFilterFactory that you have configured will do that.

Here I have taken your analysis chain, added it to a test install of 
Solr, and tried it out.  It appears to be doing exactly what you want it 
to do.


https://www.dropbox.com/s/5puf7rzbypdcspu/wdf-analysis-marotosg.png?dl=0

Thanks,
Shawn


Re: Strip out punctuation at the end of token

2017-11-23 Thread Emir Arnautović
Hi Sergio,
You can use PatternCaptureGroupFilterFactory to emit both tokens. This token 
filter is not documented in recent documentation but it is still there.

HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 23 Nov 2017, at 16:06, marotosg  wrote:
> 
> Hi all,
> 
> I am trying to strip out any "."  at the end of a token but I would like to
> keep the original token as well.
> This is my index analyzer
> 
> 
>   generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"
> preserveOriginal="1"/>
>   preserveOriginal="false"/>
>  
> 
> 
> i was thinking of using the solr.PatternReplaceFilterFactory but i see this
> one won't keep the original token.
> 
> Any help?
> 
> Thanks a lot
> Sergio Maroto
> 
> 
> 
> 
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html



Strip out punctuation at the end of token

2017-11-23 Thread marotosg
Hi all,

I am trying to strip out any "."  at the end of a token but I would like to
keep the original token as well.
This is my index analyzer

 
  
  
  


i was thinking of using the solr.PatternReplaceFilterFactory but i see this
one won't keep the original token.

Any help?

Thanks a lot
Sergio Maroto




--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


RE: Result grouping performance

2017-11-23 Thread Kempelen , Ákos
Hi Mikhail,

group.facet=false, but group.truncate=true.
I found out that if the group.truncate parameter is true, the query will be 
much slower (x5 times), regardless of faceting.
I thought, that the group.truncate param is meaningless without activating the 
facet system, but this is not the case.

Testing environment: Solr 7.0.1, 1 core with 70GB data (~1.000.000 documents), 
no shards

Akos



-Original Message-
From: Mikhail Khludnev [mailto:m...@apache.org] 
Sent: Thursday, November 23, 2017 8:38 AM
To: solr-user 
Subject: Re: Result grouping performance

Akos,
Can you provide your request params? Do you just group and/or count grouped 
facets?
Can you clarify how field collapsing is different from grouping, just make it 
unambiguous?


On Wed, Nov 22, 2017 at 4:13 PM, Kempelen, Ákos < 
akos.kempe...@wolterskluwer.com> wrote:

> Hello,
>
> I am migrating our codebase from Solr 4.7 to 7.0.1 but the performance 
> of result grouping seems very poor using the newer Solr.
> For example a simple MatchAllDocsQuery takes 5 sec on Solr4.7, and 21 
> sec on Solr7.
> I wonder what causes the x4 difference in time? We hoped that newer 
> Solr versions will provide better performances...
> Using Field collapsing could would be a solution, but it produces 
> different facet counts.
> Thanks,
> Akos
>
>
>


--
Sincerely yours
Mikhail Khludnev


Re: Merging of index in Solr

2017-11-23 Thread Zheng Lin Edwin Yeo
Hi Shawn,

Thanks for the info. We will most likely be doing sharding when we migrate
to Solr 7.1.0, and re-index the data.

But as Solr 7.1.0 is still not ready to index EML files yet due to this
JIRA, https://issues.apache.org/jira/browse/SOLR-11622, we have to make use
with our current Solr 6.5.1 first, which was already created without
sharding from the start.

Regards,
Edwin

On 23 November 2017 at 12:50, Shawn Heisey  wrote:

> On 11/22/2017 6:19 PM, Zheng Lin Edwin Yeo wrote:
>
>> I'm doing the merging on the SSD drive, the speed should be ok?
>>
>
> The speed of virtually all modern disks will have almost no influence on
> the speed of the merge.  The bottleneck isn't disk transfer speed, it's the
> operation of the merge code in Lucene.
>
> As I said earlier in this thread, a merge is **NOT** just a copy. Lucene
> must completely rebuild the data structures of the index to incorporate all
> of the segments of the source indexes into a single segment in the target
> index, while simultaneously *excluding* information from documents that
> have been deleted.
>
> The best speed I have ever personally seen for a merge is 30 megabytes per
> second.  This is far below the sustained transfer rate of a typical modern
> SATA disk.  SSD is capable of far faster data transfer ...but it will NOT
> make merges go any faster.
>
> We need to merge because the data are indexed in two different collections,
>> and we need them to be under the same collection, so that we can do things
>> like faceting more accurately.
>> Will sharding alone achieve this? Or do we have to merge first before we
>> do
>> the sharding?
>>
>
> If you want the final index to be sharded, it's typically best to index
> from scratch into a new empty collection that has the number of shards you
> want.  The merging tool you're using isn't aware of concepts like shards.
> It combines everything into a single index.
>
> It's not entirely clear what you're asking with the question about
> sharding alone.  Making a guess:  I have never heard of facet accuracy
> being affected by whether or not the index is sharded.  If that *is*
> possible, then I would expect an index that is NOT sharded to have better
> accuracy.
>
> Thanks,
> Shawn
>
>


Re: JSON-B deserialization of Solr-response with highlightning

2017-11-23 Thread Emir Arnautović
Hi Magnus,
Not sure if this is the right group for this question and I did not code this 
part for a long time, and not sure if fully understood the issue, but can you 
map higlighting to Map?

Also, not sure if using this in example in your tests, but you are missing 
quote for docId_0 field.

HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 23 Nov 2017, at 12:30, Magnus Ebbesson  
> wrote:
> 
> Hi,
> 
> 
> We're started to migrate our integration-framework to move over to JavaEE 
> JSON-B as default json-serialization /deserialization framework and now the 
> highlighning component is giving us some troubles. Here's a constructed 
> example of the JSON response from Solr.
> 
> {
>  "responseHeader":{
>"status":0,
>"QTime":8,
>"params":{
>  "q":"queryword",
>  "indent":"on",
>  "rows":"1",
>  "wt":"json"}},
>  "response":{"numFound":123,"start":0,"docs":[
>  {
>"title":"some text.. Queryword some more text..",
>"id":"docId_0"}]
>  },
>  "highlighting":{
>docId_0":{
>  "title":["some text.. queryword.. some more text"]}
> 
>   }
> 
> }
> 
> 
> The JSON-B Spec (JSR- 367 Java API for JSON Binding ("Specification") 
> Version: 1.0) specifically details that when JSON Binding implementation 
> during deserialization encounters key in key/value pair that it does not 
> recognize, it should treat the rest of the JSON document as if the element 
> simply did not appear, and in particular, the implementation MUST NOT treat 
> this as an error condition.
> 
> 
> docId_0 being a dynamic key for the highlighting-response we're unable to 
> pick up and with doc-ids being totally dynamic, creating any kind of known 
> mapping for an object isn't feasible.
> 
> 
> Are there any ideas on how to solve this without having to implement a 
> custom-parser into our framework for this specific use case?
> 
> Solr-J has been ruled previously due to policies and we've used Jackson 
> Any-type in the previous version of the framework, but it has been now been 
> replaced by JSON-B standards..
> 
> 
> BR
> 
> Magnus Ebbesson
> 
> 
> 



JSON-B deserialization of Solr-response with highlightning

2017-11-23 Thread Magnus Ebbesson
Hi,


We're started to migrate our integration-framework to move over to JavaEE 
JSON-B as default json-serialization /deserialization framework and now the 
highlighning component is giving us some troubles. Here's a constructed example 
of the JSON response from Solr.

{
  "responseHeader":{
"status":0,
"QTime":8,
"params":{
  "q":"queryword",
  "indent":"on",
  "rows":"1",
  "wt":"json"}},
  "response":{"numFound":123,"start":0,"docs":[
  {
"title":"some text.. Queryword some more text..",
"id":"docId_0"}]
  },
  "highlighting":{
docId_0":{
  "title":["some text.. queryword.. some more text"]}

   }

}


The JSON-B Spec (JSR- 367 Java API for JSON Binding ("Specification") Version: 
1.0) specifically details that when JSON Binding implementation during 
deserialization encounters key in key/value pair that it does not recognize, it 
should treat the rest of the JSON document as if the element simply did not 
appear, and in particular, the implementation MUST NOT treat this as an error 
condition.


docId_0 being a dynamic key for the highlighting-response we're unable to pick 
up and with doc-ids being totally dynamic, creating any kind of known mapping 
for an object isn't feasible.


Are there any ideas on how to solve this without having to implement a 
custom-parser into our framework for this specific use case?

Solr-J has been ruled previously due to policies and we've used Jackson 
Any-type in the previous version of the framework, but it has been now been 
replaced by JSON-B standards..


BR

Magnus Ebbesson





Re: Reusable tokenstream

2017-11-23 Thread Roxana Danger
That's great!! Got it.
Thank you very much.


On Wed, Nov 22, 2017 at 5:07 PM, Emir Arnautović <
emir.arnauto...@sematext.com> wrote:

> Hi Roxana,
> The idea with update request processor is to have following parameters:
> * inputField - document field with text to analyse
> * sharedAnalysis - field type with shared analysis definition
> * targetFields - comma separated list of fields where results should be
> stored.
> * fieldSpecificAnalysis - comma separated list of field types that defines
> specifics for each field (reusing schema will have extra tokenizer that
> should be ignored)
>
> Your update processor uses TeeSinkTokenFilter to create tokens for each
> field, but you do not write those tokens to index. You add new fields to
> document where each token is new value (or can concat and have whitespace
> tokenizer in indexing analysis chain of target field). You can remove
> inputField from document.
>
> HTH,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 22 Nov 2017, at 17:46, Roxana Danger  wrote:
> >
> > Hi Emir,
> > In this case, I need more control at Lucene level, so I have to use the
> > lucene index writer directly. So, I can not use Solr for importing.
> > Or, is there anyway I can add a tokenstream to a SolrInputDocument (is
> > there any other class exposed by Solr during indexing that I can use for
> > this purpose?).
> > Am I correct or still missing something?
> > Thank you.
> >
> >
> > On Wed, Nov 22, 2017 at 11:33 AM, Emir Arnautović <
> > emir.arnauto...@sematext.com> wrote:
> >
> >> Hi Roxana,
> >> I think you can use https://lucene.apache.org/
> core/5_4_0/analyzers-common/
> >> org/apache/lucene/analysis/sinks/TeeSinkTokenFilter.html <
> >> https://lucene.apache.org/core/5_4_0/analyzers-common/
> >> org/apache/lucene/analysis/sinks/TeeSinkTokenFilter.html> like
> suggested
> >> earlier.
> >>
> >> HTH,
> >> Emir
> >> --
> >> Monitoring - Log Management - Alerting - Anomaly Detection
> >> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
> >>
> >>
> >>
> >>> On 22 Nov 2017, at 11:43, Roxana Danger 
> wrote:
> >>>
> >>> Hi Emir,
> >>> Many thanks for your reply.
> >>> The UpdateProcessor can do this work, but is
> analyzer.reusableTokenStream
> >>>  >> apache/lucene/analysis/Analyzer.html#reusableTokenStream(java.lang.
> String,
> >>> java.io.Reader)> the way to obtain a previous generated tokenstream? is
> >> it
> >>> guarantee to get access to the token stream and not reconstruct it?
> >>> Thanks,
> >>> Roxana
> >>>
> >>>
> >>> On Wed, Nov 22, 2017 at 10:26 AM, Emir Arnautović <
> >>> emir.arnauto...@sematext.com> wrote:
> >>>
>  Hi Roxana,
>  I don’t think that it is possible. In some cases (seems like yours is
> >> good
>  fit) you could create custom update request processor that would do
> the
>  shared analysis (you can have it defined in schema) and after analysis
> >> use
>  those tokens to create new values for those two fields and remove
> source
>  value (or flag it as ignored in schema).
> 
>  HTH,
>  Emir
>  --
>  Monitoring - Log Management - Alerting - Anomaly Detection
>  Solr & Elasticsearch Consulting Support Training -
> http://sematext.com/
> 
> 
> 
> > On 22 Nov 2017, at 11:09, Roxana Danger 
> >> wrote:
> >
> > Hello all,
> >
> > I would like to reuse the tokenstream generated for one field, to
> >> create
>  a
> > new tokenstream (adding a few filters to the available tokenstream),
> >> for
> > another field without the need of executing again the whole analysis.
> >
> > The particular application is:
> > - I have field *tokens* that uses an analyzer that generate the
> tokens
>  (and
> > maintains the token type attributes)
> > - I would like to have another two new fields: *verbs* and
> >> *adjectives*.
> > These should reuse the tokenstream generated for the field *tokens*
> and
> > filter the verbs and adjectives for the respective fields.
> >
> > Is this feasible? How should it be implemented?
> >
> > Many thanks.
> 
> 
> >>
> >>
>
>