Re: Model type does not exist MultipleAdditiveTreesModel

2019-04-07 Thread Kamuela Lau
Hi Roee,

I think that in addition to the fact that the feature param is blank in the
model JSON, you should also make sure that the features you put in the JSON
have been added to the feature-store.
This generic error also occurs when the features haven't been added to the
feature store.

In addition, for Multiple Additive Trees Model, you should also make sure
you have quotes around the values, as shown here:
https://lucene.apache.org/solr/7_3_1/solr-ltr/org/apache/solr/ltr/model/MultipleAdditiveTreesModel.html

If you do not add quotes, the same 'Model type does not exist' error will
occur.

Hope this helps!

On Sat, Apr 6, 2019 at 1:03 AM Kamal Kishore Aggarwal 
wrote:

> Hi Roee,
>
> It looks the error is due to blank feature param value in the json.
>
> "name" : "my",
>"features":[],
>"params" : {
>
> I have observed that many a times solr ltr returns generic error that
> 'Model
> type does not
> exist', but later actually found to be an issue with json. Just wanted to
> share my experience.
>
> Regards
> Kamal
>
> On Thu, May 31, 2018 at 4:07 PM Roee T  wrote:
>
> > Hi all,
> > I'm trying to upload the most simple model to solr 7.3.1 and i get an
> > error:
> >
> > the model:
> >
> > {
> >"class" : "org.apache.solr.ltr.model.MultipleAdditiveTreesModel",
> >"name" : "my",
> >"features":[],
> >"params" : {
> >"trees" : [
> >{
> >"weight" : 1,
> >"root" : {
> >"value" : -10
> >}} ]}}
> >
> > The error:
> >   "error":{
> > "metadata":[
> >   "error-class","org.apache.solr.common.SolrException",
> >   "root-error-class","java.lang.IllegalArgumentException"],
> > "msg":"org.apache.solr.ltr.model.ModelException: Model type does not
> > exist org.apache.solr.ltr.model.MultipleAdditiveTreesModel",
> > "code":400}}
> >
> >
> > I inserted the configurations to solrconfig.xml like
> >> regex=".*\.jar" />
> > and started solr using   -Dsolr.ltr.enabled=true
> >
> > please help me
> > Thanks you all ;)
> >
> >
> >
> > --
> > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
> >
>


Re: Need help on LTR

2019-03-22 Thread Kamuela Lau
I think the issue is that you store the feature as  originalScore but in
your model you refer to it as original_score

On Wed, Mar 20, 2019 at 1:58 PM Mohomed Rimash  wrote:

> one more thing i noticed is your feature params values doesn't wrap in q or
> qf field. check that as well
>
> On Wed, 20 Mar 2019 at 01:34, Amjad Khan  wrote:
>
> > Did, but same error
> >
> > {
> >   "responseHeader":{
> > "status":400,
> > "QTime":5},
> >   "error":{
> > "metadata":[
> >   "error-class","org.apache.solr.common.SolrException",
> >   "root-error-class","java.lang.NullPointerException"],
> > "msg":"org.apache.solr.ltr.model.ModelException: Model type does not
> > exist org.apache.solr.ltr.model.LinearModel",
> > "code":400}}
> >
> >
> >
> > > On Mar 19, 2019, at 3:26 PM, Mohomed Rimash 
> > wrote:
> > >
> > > Please update the weights values to greater than 0 and less than 1.
> > >
> > > On Wed, 20 Mar 2019 at 00:13, Amjad Khan  wrote:
> > >
> > >> Feature File
> > >> ===
> > >>
> > >> [
> > >>  {
> > >>"store" : "exampleFeatureStore",
> > >>"name" : "isCityName",
> > >>"class" : "org.apache.solr.ltr.feature.FieldValueFeature",
> > >>"params" : { "field" : "CITY_NAME" }
> > >>  },
> > >>  {
> > >>"store" : "exampleFeatureStore",
> > >>"name" : "originalScore",
> > >>"class" : "org.apache.solr.ltr.feature.OriginalScoreFeature",
> > >>"params" : {}
> > >>  },
> > >>  {
> > >>"store" : "exampleFeatureStore",
> > >>"name" : "isLat",
> > >>"class" : "org.apache.solr.ltr.feature.FieldValueFeature",
> > >>"params" : { "field" : "LATITUDE" }
> > >>  }
> > >> ]
> > >>
> > >> Model File
> > >> ==
> > >> {
> > >>  "store": "exampleFeatureStore",
> > >>  "class": "org.apache.solr.ltr.model.LinearModel",
> > >>  "name": "exampleModelStore",
> > >>  "features": [{
> > >>  "name": "isCityName"
> > >>},
> > >>{
> > >>  "name": "isLat"
> > >>},
> > >>{
> > >>  "name": "original_score"
> > >>}
> > >>  ],
> > >>  "params": {
> > >>"weights": {
> > >>  "isCityName": 0.0,
> > >>  "isLat": 0.0,
> > >>  "original_score": 1.0
> > >>}
> > >>  }
> > >> }
> > >>
> > >>
> > >>
> > >>> On Mar 19, 2019, at 2:04 PM, Mohomed Rimash 
> > >> wrote:
> > >>>
> > >>> Can you share the feature file and the model file,
> > >>> 1. I had few instances where invalid values for parameters (ie
> weights
> > >> set
> > >>> to more than 1 , with minmaxnormalizer) resulted the above error,
> > >>> 2, Check all the features added to the model has a weight under
> params
> > ->
> > >>> weights in the model
> > >>>
> > >>>
> > >>> On Tue, 19 Mar 2019 at 21:21, Roopa Rao  wrote:
> > >>>
> >  Does your feature definitions and the feature names used in the
> model
> >  match?
> > 
> >  On Tue, Mar 19, 2019 at 10:17 AM Amjad Khan 
> > >> wrote:
> > 
> > > Yes, I did.
> > >
> > > I can see the feature that I created by this
> > > schema/feature-store/exampleFeatureStore and it return me the
> > features
> > >> I
> > > created. But issue is when I try to put store-model.
> > >
> > >> On Mar 19, 2019, at 12:18 AM, Mohomed Rimash <
> rim...@yaalalabs.com>
> > > wrote:
> > >>
> > >> Hi Amjad, After adding the libraries into the path, Did you
> restart
> > >> the
> > >> SOLR ?
> > >>
> > >> On Tue, 19 Mar 2019 at 08:45, Amjad Khan 
> > wrote:
> > >>
> > >>> I followed the Solr LTR Documentation
> > >>>
> > >>> https://lucene.apache.org/solr/guide/7_4/learning-to-rank.html <
> > >>> https://lucene.apache.org/solr/guide/7_4/learning-to-rank.html>
> > >>>
> > >>> 1. Added library into the solr-config
> > >>> 
> > >>>  > >>> regex=".*\.jar" />
> > >>>  > >>> regex="solr-ltr-\d.*\.jar" />
> > >>> 2. Successfully added feature
> > >>> 3. Get schema to see feature is available
> > >>> 4. When I try to push model I see the error below, however I
> added
> > >> the
> > > lib
> > >>> into solr-cofig
> > >>>
> > >>> Response
> > >>> {
> > >>> "responseHeader":{
> > >>>  "status":400,
> > >>>  "QTime":1},
> > >>> "error":{
> > >>>  "metadata":[
> > >>>"error-class","org.apache.solr.common.SolrException",
> > >>>"root-error-class","java.lang.NullPointerException"],
> > >>>  "msg":"org.apache.solr.ltr.model.ModelException: Model type does
> >  not
> > >>> exist org.apache.solr.ltr.model.LinearModel",
> > >>>  "code":400}}
> > >>>
> > >>> Thanks
> > >
> > >
> > 
> > >>
> > >>
> >
> >
>


Nested geofilt query for LTR feature

2019-03-14 Thread Kamuela Lau
Hello,

I'm currently using Solr 7.2.2 and trying to use the LTR contrib module to
rerank queries.
For my LTR model, I would like to use a feature that is essentially a
"normalized distance," a value between 0 and 1 which is based on distance.

When using geodist() to define a feature in the feature store, I received a
"failed to parse feature query" error, and thus I am using the below
geofilt query for distance.

{
  "name":"dist",
  "class":"org.apache.solr.ltr.feature.SolrFeature",
  "params":{"q":"{!geofilt sfield=latlon score=kilometers filter=false
pt=${ltrpt} d=5000}"},
  "store":"ltrFeatureStore"
}

This feature correctly returns the distance between ltrpt and the sfield
latlon (LatLonPointSpatialField).
As I mentioned previously, I would like a feature which uses this distance
in another function. To test this functionality, I tried to define a
feature which multiplies the distance by two:

{
  "name":"twoDist",
  "class":"org.apache.solr.ltr.feature.SolrFeature",
  "params":{"q":"{!func}product(2,query({!geofilt v= sfield=latlon
score=kilometers filter=false pt=${ltrpt} d=5000},0.0))"},
  "store":"ltrFeatureStore"
}

When trying to extract this feature, I receive the following error:

java.lang.RuntimeException: Exception from createWeight for SolrFeature
[name=multDist, params={q={!func}product(2,query({!geofilt v= sfield=latlon
score=kilometers filter=false pt=${ltrpt} d=5000},0.0))}]  missing sfield
for spatial request

However, when I define the following in fl for a regular, non-reranked
query, I find that it is correctly parsed and I receive the correct value,
which is twice the value of geodist() (pt2 is defined in a different part
of the query):
fl=score,geodist(),{!func}product(2,query({!geofilt v= sfield=latlon
score=kilometers filter=false pt=${pt2} d=5},0.0))

For reference, below is what I have defined in my schema:

   


Is this the correct, intended behavior? If so, is my query for this
correct, or should I go about extracting this sort of feature a different
way?


Re: Indexing PDF file in Apache SOLR via Apache TIKA

2018-10-30 Thread Kamuela Lau
Hi there,

Here are a couple of ways I'm aware of:

1. Extract-handler / post tool
You can use the curl command with the extract handler or bin/post to upload
a single document.
Reference:
https://lucene.apache.org/solr/guide/7_5/uploading-data-with-solr-cell-using-apache-tika.html

2. DataImportHandler
This could be used for, say, uploading multiple documents with Tika.
Reference:
https://lucene.apache.org/solr/guide/7_5/uploading-structured-data-store-data-with-the-data-import-handler.html#the-tikaentityprocessor

You should also be able to do it via the admin page, so long as you define
and modify the extract handler in solrconfig.xml.
Reference:
https://lucene.apache.org/solr/guide/7_5/documents-screen.html#file-upload

Hope this helps!

On Tue, Oct 30, 2018 at 3:40 PM adiyaksa kevin 
wrote:

> Hello there, let me introduce my self. My name is Mohammad Kevin Putra (you
> can call me Kevin), from Indonesia, i am a beginner in backend developer, i
> use Linux Mint, i use Apache SOLR 7.5.0 and Apache TIKA 1.91.0.
>
> I have a little bit problem about how to put PDF File via Apache TIKA. I
> understand how SOLR or TIKA works, but i don't know how they both
> integrated.
> Last thing i know, TIKA can extract the PDF file i upload, and parse it
> into data/meta data automatically. And i just have to copy & paste it to
> the "Documents" tab in core solr.
> The question is :
> 1. can i upload PDF File to SOLR via TIKA with GUI mode ? or is it only
> with CLI mode ? if yes only with CLI mode, can you explain it to me please
> ?
> 2. Is it possible to add a text result in "Query" tab ?.
>
> The Background i asking about this is, i want to indexing PDF in my local
> system, then i just upload it like "drag & drop" in SOLR (is it possible ?)
> then when i type something in search box the result is like this :
> (Title of doc)
> blablablabla (yellow stabilo result) blablabla.
> the blablabla text is like a couple sentences. That's all i need.
> Sorry for my bad english.
> Thanks for reading and replying this for me, it will be very helpful to me.
> Thanks a lot
>


Re: LTR features on solr

2018-10-26 Thread Kamuela Lau
I have never done such a thing myself, but I think that dynamic field would
probably be the way to go.

I've not used it myself, but you might also be able to do what you want
with payloads:

https://lucene.apache.org/solr/guide/7_5/function-queries.html#payload-function

https://lucidworks.com/2017/09/14/solr-payloads/

Hope that answers your question.

2018年10月26日(金) 18:28 Midas A :

> *Thanks for relpy . Please find my answers below inline.*
>
>
> On Fri, Oct 26, 2018 at 2:41 PM Kamuela Lau  wrote:
>
> > Hi,
> >
> > Just to confirm, are you asking about the following?
> >
> > For a particular query, you have a list of documents, and for each
> > document, you have data
> > on the number of times the document was clicked on, added to a cart, and
> > ordered, and you
> > would like to use this data for features. Is this correct?
> > *[ME] :Yes*
> > If this is the case, are you indexing that data?
> >
>*[ME]* : *Yes we are planing to index the data but my question is how we
> should store it in to solr .*
> * should i create dynamic field to store the click, cart and order
> data per query for document?.*
> * Please guide me how we should store. *
>
> >
> > I believe that the features which can be used for the LTR module is
> > information that is either indexed,
> > or indexed information which has been manipulated through the use of
> > function queries.
> >
> > https://lucene.apache.org/solr/guide/7_5/learning-to-rank.html
> >
> > It seems to me that you would have to frequently index the click data, if
> > you need to refresh the data frequently
> >
>   *  [ME] : we are planing to refresh this data weekly.*
>
> >
> > On Fri, Oct 26, 2018 at 4:24 PM Midas A  wrote:
> >
> > > Hi  All,
> > >
> > > I am new in implementing solr LTR .  so facing few challenges
> > > Broadly  we have 3 kind of features
> > > a) Based on query
> > > b) based on document
> > > *c) Based on query-document from click ,cart and order  from tracker
> > data.*
> > >
> > > So my question here is how to store c) type of features
> > >- Old queries and corresponding clicks ((query-clicks)
> > > - Old query -cart addition  and
> > >   - Old query -order data
> > >  into solr to run LTR model
> > > and secoundly how to build features for query-clicks, query-cart and
> > > query-orders because we need to refresh  this data frequently.
> > >
> > > What approch should i follow .
> > >
> > > Hope i am able to explain my problem.
> > >
> >
>


Re: LTR features on solr

2018-10-26 Thread Kamuela Lau
Hi,

Just to confirm, are you asking about the following?

For a particular query, you have a list of documents, and for each
document, you have data
on the number of times the document was clicked on, added to a cart, and
ordered, and you
would like to use this data for features. Is this correct?

If this is the case, are you indexing that data?

I believe that the features which can be used for the LTR module is
information that is either indexed,
or indexed information which has been manipulated through the use of
function queries.

https://lucene.apache.org/solr/guide/7_5/learning-to-rank.html

It seems to me that you would have to frequently index the click data, if
you need to refresh the data frequently

On Fri, Oct 26, 2018 at 4:24 PM Midas A  wrote:

> Hi  All,
>
> I am new in implementing solr LTR .  so facing few challenges
> Broadly  we have 3 kind of features
> a) Based on query
> b) based on document
> *c) Based on query-document from click ,cart and order  from tracker data.*
>
> So my question here is how to store c) type of features
>- Old queries and corresponding clicks ((query-clicks)
> - Old query -cart addition  and
>   - Old query -order data
>  into solr to run LTR model
> and secoundly how to build features for query-clicks, query-cart and
> query-orders because we need to refresh  this data frequently.
>
> What approch should i follow .
>
> Hope i am able to explain my problem.
>


Re: DIH for TikaEntityProcessor

2018-10-12 Thread Kamuela Lau
Glad to help :)

2018年10月12日(金) 21:10 Martin Frank Hansen (MHQ) :

> You sir just made my day!!!
>
> It worked!!! Thanks a million!
>
>
> Martin Frank Hansen,
>
> -Oprindelig meddelelse-
> Fra: Kamuela Lau 
> Sendt: 12. oktober 2018 11:41
> Til: solr-user@lucene.apache.org
> Emne: Re: DIH for TikaEntityProcessor
>
> Also, just wondering, have you have tried to specify dataSource="bin" for
> read_file?
>
> On Fri, Oct 12, 2018 at 6:38 PM Kamuela Lau  wrote:
>
> > Hi,
> >
> > I was unable to reproduce the error that you got with the information
> > provided.
> > Below are the data-config.xml and managed-schema fields I used; the
> > data-config is mostly the same (I think that BinFileDataSource doesn't
> > actually require a dataSource, so I think it's safe to put
> > dataSource="null"):
> >
> > 
> >   
> >   
> >> baseDir="/path/to/sampleData" fileName=".*doc" recursive="true"
> > rootEntity="false" dataSource="bin" onError="skip">
> > 
> >  > url="${files.fileAbsolutePath}">
> >   
> > 
> >   
> >   
> > 
> >
> > And from the managed schema:
> >  > required="true" multiValued="false" />
> > 
> > 
> >  > docValues="false" />
> >  > multiValued="true"/>
> >
> > When I had field column="text" name="content", the documents were
> > still indexed, but the text/content was not (as I had no content field
> > in the schema).
> > I used the default config, and Solr version 7.5.0; I was able to
> > import the data just fine (I also tested with .*DOC). Is there any
> > other information you can provide that can help me reproduce this error?
> >
> >
> >
> >
> > On Fri, Oct 12, 2018 at 4:11 PM Martin Frank Hansen (MHQ) 
> > wrote:
> >
> >> Hi again,
> >>
> >>
> >>
> >> Can anybody help me? Any suggestions to why I am getting the error
> below?
> >>
> >>
> >>
> >>
> >>
> >> *Martin Frank Hansen*, Senior Data Analytiker
> >>
> >> Data, IM & Analytics
> >>
> >> [image: cid:image001.png@01D383C9.6C129A60]
> >>
> >>
> >> Lautrupparken 40-42, DK-2750 Ballerup E-mail m...@kmd.dk  Web
> >> www.kmd.dk Mobil +4525571418
> >>
> >>
> >>
> >> *Fra:* Martin Frank Hansen (MHQ)
> >> *Sendt:* 10. oktober 2018 10:15
> >> *Til:* solr-user 
> >> *Emne:* DIH for TikaEntityProcessor
> >>
> >>
> >>
> >> Hi,
> >>
> >>
> >>
> >> I am trying to read documents from a file system into Solr, using
> >> dataimporthandler but keep getting the following errors:
> >>
> >>
> >>
> >> Exception while processing: files document :
> >> null:org.apache.solr.handler.dataimport.DataImportHandlerException:
> >> java.lang.ClassCastException: java.io.InputStreamReader cannot be
> >> cast to java.io.InputStream
> >>
> >>  at
> >> org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAnd
> >> Throw(DataImportHandlerException.java:61)
> >>
> >>  at
> >> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(Ent
> >> ityProcessorWrapper.java:270)
> >>
> >>  at
> >> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilde
> >> r.java:476)
> >>
> >>  at
> >> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilde
> >> r.java:517)
> >>
> >>  at
> >> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilde
> >> r.java:415)
> >>
> >>  at
> >> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.j
> >> ava:330)
> >>
> >>  at
> >> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java
> >> :233)
> >>
> >>  at
> >> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImpo
> >> rter.java:424)
> >>
> >>  at
> >> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.j
> >> ava:483)
> >>
> >>  at
>

Re: DIH for TikaEntityProcessor

2018-10-12 Thread Kamuela Lau
Also, just wondering, have you have tried to specify dataSource="bin" for
read_file?

On Fri, Oct 12, 2018 at 6:38 PM Kamuela Lau  wrote:

> Hi,
>
> I was unable to reproduce the error that you got with the information
> provided.
> Below are the data-config.xml and managed-schema fields I used; the
> data-config is mostly the same
> (I think that BinFileDataSource doesn't actually require a dataSource, so
> I think it's safe to put dataSource="null"):
>
> 
>   
>   
>baseDir="/path/to/sampleData" fileName=".*doc" recursive="true"
> rootEntity="false" dataSource="bin" onError="skip">
> 
>  url="${files.fileAbsolutePath}">
>   
> 
>   
>   
> 
>
> And from the managed schema:
>  required="true" multiValued="false" />
> 
> 
>  docValues="false" />
>  multiValued="true"/>
>
> When I had field column="text" name="content", the documents were still
> indexed, but the text/content was not (as I had no content field in the
> schema).
> I used the default config, and Solr version 7.5.0; I was able to import
> the data just fine (I also tested with .*DOC). Is there any other
> information you can provide that can help me reproduce this error?
>
>
>
>
> On Fri, Oct 12, 2018 at 4:11 PM Martin Frank Hansen (MHQ) 
> wrote:
>
>> Hi again,
>>
>>
>>
>> Can anybody help me? Any suggestions to why I am getting the error below?
>>
>>
>>
>>
>>
>> *Martin Frank Hansen*, Senior Data Analytiker
>>
>> Data, IM & Analytics
>>
>> [image: cid:image001.png@01D383C9.6C129A60]
>>
>>
>> Lautrupparken 40-42, DK-2750 Ballerup
>> E-mail m...@kmd.dk  Web www.kmd.dk
>> Mobil +4525571418
>>
>>
>>
>> *Fra:* Martin Frank Hansen (MHQ)
>> *Sendt:* 10. oktober 2018 10:15
>> *Til:* solr-user 
>> *Emne:* DIH for TikaEntityProcessor
>>
>>
>>
>> Hi,
>>
>>
>>
>> I am trying to read documents from a file system into Solr, using
>> dataimporthandler but keep getting the following errors:
>>
>>
>>
>> Exception while processing: files document : 
>> null:org.apache.solr.handler.dataimport.DataImportHandlerException: 
>> java.lang.ClassCastException: java.io.InputStreamReader cannot be cast to 
>> java.io.InputStream
>>
>>  at 
>> org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:61)
>>
>>  at 
>> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:270)
>>
>>  at 
>> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:476)
>>
>>  at 
>> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:517)
>>
>>  at 
>> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:415)
>>
>>  at 
>> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:330)
>>
>>  at 
>> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:233)
>>
>>  at 
>> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:424)
>>
>>  at 
>> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:483)
>>
>>  at 
>> org.apache.solr.handler.dataimport.DataImporter.lambda$runAsync$0(DataImporter.java:466)
>>
>>  at java.lang.Thread.run(Thread.java:748)
>>
>> Caused by: java.lang.ClassCastException: java.io.InputStreamReader cannot be 
>> cast to java.io.InputStream
>>
>>  at 
>> org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:132)
>>
>>  at 
>> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:267)
>>
>>  ... 9 more
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> Full Import failed:java.lang.RuntimeException:
>> java.lang.RuntimeException:
>> org.apache.solr.handler.dataimport.DataImportHandlerException:
>> java.lang.ClassCastException: java.io.InputStreamReader cannot be cast to
>> java.io.InputStream
>>
>>  at
>> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:2

Re: DIH for TikaEntityProcessor

2018-10-12 Thread Kamuela Lau
Hi,

I was unable to reproduce the error that you got with the information
provided.
Below are the data-config.xml and managed-schema fields I used; the
data-config is mostly the same
(I think that BinFileDataSource doesn't actually require a dataSource, so I
think it's safe to put dataSource="null"):


  
  
  


  

  
  


And from the managed schema:






When I had field column="text" name="content", the documents were still
indexed, but the text/content was not (as I had no content field in the
schema).
I used the default config, and Solr version 7.5.0; I was able to import the
data just fine (I also tested with .*DOC). Is there any other information
you can provide that can help me reproduce this error?




On Fri, Oct 12, 2018 at 4:11 PM Martin Frank Hansen (MHQ) 
wrote:

> Hi again,
>
>
>
> Can anybody help me? Any suggestions to why I am getting the error below?
>
>
>
>
>
> *Martin Frank Hansen*, Senior Data Analytiker
>
> Data, IM & Analytics
>
> [image: cid:image001.png@01D383C9.6C129A60]
>
>
> Lautrupparken 40-42, DK-2750 Ballerup
> E-mail m...@kmd.dk  Web www.kmd.dk
> Mobil +4525571418
>
>
>
> *Fra:* Martin Frank Hansen (MHQ)
> *Sendt:* 10. oktober 2018 10:15
> *Til:* solr-user 
> *Emne:* DIH for TikaEntityProcessor
>
>
>
> Hi,
>
>
>
> I am trying to read documents from a file system into Solr, using
> dataimporthandler but keep getting the following errors:
>
>
>
> Exception while processing: files document : 
> null:org.apache.solr.handler.dataimport.DataImportHandlerException: 
> java.lang.ClassCastException: java.io.InputStreamReader cannot be cast to 
> java.io.InputStream
>
>  at 
> org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:61)
>
>  at 
> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:270)
>
>  at 
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:476)
>
>  at 
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:517)
>
>  at 
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:415)
>
>  at 
> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:330)
>
>  at 
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:233)
>
>  at 
> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:424)
>
>  at 
> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:483)
>
>  at 
> org.apache.solr.handler.dataimport.DataImporter.lambda$runAsync$0(DataImporter.java:466)
>
>  at java.lang.Thread.run(Thread.java:748)
>
> Caused by: java.lang.ClassCastException: java.io.InputStreamReader cannot be 
> cast to java.io.InputStream
>
>  at 
> org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:132)
>
>  at 
> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:267)
>
>  ... 9 more
>
>
>
>
>
>
>
>
>
> Full Import failed:java.lang.RuntimeException: java.lang.RuntimeException:
> org.apache.solr.handler.dataimport.DataImportHandlerException:
> java.lang.ClassCastException: java.io.InputStreamReader cannot be cast to
> java.io.InputStream
>
>  at
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:271)
>
>  at
> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:424)
>
>  at
> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:483)
>
>  at
> org.apache.solr.handler.dataimport.DataImporter.lambda$runAsync$0(DataImporter.java:466)
>
>  at java.lang.Thread.run(Thread.java:748)
>
> Caused by: java.lang.RuntimeException:
> org.apache.solr.handler.dataimport.DataImportHandlerException:
> java.lang.ClassCastException: java.io.InputStreamReader cannot be cast to
> java.io.InputStream
>
>  at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:417)
>
>  at
> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:330)
>
>  at
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:233)
>
>  ... 4 more
>
> Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException:
> java.lang.ClassCastException: java.io.InputStreamReader cannot be cast to
> java.io.InputStream
>
>  at
> org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:61)
>
>  at
> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:270)
>
>  at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:476)
>
>  at
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:517)
>
> 

Re: Is it possible to use limit and sort with BlockJoin Facet?

2017-12-31 Thread Kamuela Lau
Thank you very much for your help.

2017/12/29 15:46 "Mikhail Khludnev" :

> On Fri, Dec 29, 2017 at 4:37 AM, Kamuela Lau 
> wrote:
>
> > Hello,
> >
> > Thank you very much for the confirmation.
> >
> > As BJQFacet doesn't support limit, I have the impression that if the
> number
> > of documents is large,
> > there would be a noticeable decrease in performance. Is this correct?
> >
>
> No. Limiting facet almost never makes it faster.
>
>
> >
> > If so, I was considering instead using JSON Facet API to create
> > parent/child relationships with facets, however I could not find an
> > official document with specific details.
> > JSON Facet API appears to be quite fast,  but all the documentation I
> could
> > find was at the link below:
> > http://yonik.com/json-facet-api/
> >
> > I would like to investigate more about how to use JSON Facet API, so any
> > information or a point in the right direction would be very helpful.
> >
> https://lucene.apache.org/solr/guide/7_2/json-facet-api.html
>
> You are welcome.
>
> >
> > Thanks,
> >
> > On Fri, Dec 29, 2017 at 6:50 AM, Mikhail Khludnev 
> wrote:
> >
> > > Hello,
> > > Block join works in the single core only. Please check the docs.
> > > BJQFacet doesn't support limit and sort.
> > >
> > > On Thu, Dec 28, 2017 at 12:39 PM, Kamuela Lau 
> > > wrote:
> > >
> > > > Hi All,
> > > >
> > > > I am currently trying to figure out a way to apply a parent/child
> > > > relationship between two cores to searches.
> > > > I thought that BlockJoin Facet could be a good way to do this, but I
> > > would
> > > > like to know, is it impossible to use limit and sort with BlockJoin
> > > Facet?
> > > >
> > > > Thanks,
> > > > Kamu
> > > >
> > > >
> > > > On Tue, Dec 26, 2017 at 2:31 PM, Kamuela Lau 
> > > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I am working on applying parent/child types to search using Solr,
> and
> > > > > wanted to inquire about two things.
> > > > >
> > > > > 1) BlockJoin Facet
> > > > >
> > > > > Currently, there are two cores. The first core (the item core) has
> 16
> > > > > million documents, and the second core (the product core) has 3
> > million
> > > > > documents.
> > > > > The product core is the parent, and the item core is the child.
> > > > >
> > > > > To apply this parent/child relationship to searches, I thought
> that I
> > > > > could use BlockJoin Facet,
> > > > > but because the number of documents being searched is so large, I
> am
> > > > > worried about the decrease in performance of Solr when using
> > BlockJoin
> > > > > Facet.
> > > > >
> > > > > I thought it may be possible to avoid dips in performance while
> using
> > > > > BlockJoin Facet by using limit and sort,
> > > > > however when I tried to look up more information about this, I was
> > > unsure
> > > > > of if that is possible.
> > > > >
> > > > > When using BlockJoin Facet, is it not possible to use limit and
> sort?
> > > > >
> > > > > 2) JSON Facet API
> > > > >
> > > > > While I was researching about the above problem, I came across JSON
> > > Facet
> > > > > API.
> > > > > Is it possible to do the same thing as BlockJoin Facet with JSON
> > Facet
> > > > API?
> > > > > If it is possible, what is the most appropriate way to use each
> one?
> > > > >
> > > > > I was unable to find an official document with more detailed
> > > information
> > > > > about Solr's JSON Facet API,
> > > > > so if anyone knows of a relevant resource or reference material, I
> > > would
> > > > > be grateful if you could share it.
> > > > >
> > > > > I apologize for the rather open-ended question, but an answer or
> > even a
> > > > > point in the right direction (an article
> > > > > or resource) would be greatly appreciated.
> > > > >
> > > > > Thanks,
> > > > > Kamu
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Sincerely yours
> > > Mikhail Khludnev
> > >
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>


Re: Is it possible to use limit and sort with BlockJoin Facet?

2017-12-28 Thread Kamuela Lau
Hello,

Thank you very much for the confirmation.

As BJQFacet doesn't support limit, I have the impression that if the number
of documents is large,
there would be a noticeable decrease in performance. Is this correct?

If so, I was considering instead using JSON Facet API to create
parent/child relationships with facets, however I could not find an
official document with specific details.
JSON Facet API appears to be quite fast,  but all the documentation I could
find was at the link below:
http://yonik.com/json-facet-api/

I would like to investigate more about how to use JSON Facet API, so any
information or a point in the right direction would be very helpful.

Thanks,

On Fri, Dec 29, 2017 at 6:50 AM, Mikhail Khludnev  wrote:

> Hello,
> Block join works in the single core only. Please check the docs.
> BJQFacet doesn't support limit and sort.
>
> On Thu, Dec 28, 2017 at 12:39 PM, Kamuela Lau 
> wrote:
>
> > Hi All,
> >
> > I am currently trying to figure out a way to apply a parent/child
> > relationship between two cores to searches.
> > I thought that BlockJoin Facet could be a good way to do this, but I
> would
> > like to know, is it impossible to use limit and sort with BlockJoin
> Facet?
> >
> > Thanks,
> > Kamu
> >
> >
> > On Tue, Dec 26, 2017 at 2:31 PM, Kamuela Lau 
> > wrote:
> >
> > > Hi,
> > >
> > > I am working on applying parent/child types to search using Solr, and
> > > wanted to inquire about two things.
> > >
> > > 1) BlockJoin Facet
> > >
> > > Currently, there are two cores. The first core (the item core) has 16
> > > million documents, and the second core (the product core) has 3 million
> > > documents.
> > > The product core is the parent, and the item core is the child.
> > >
> > > To apply this parent/child relationship to searches, I thought that I
> > > could use BlockJoin Facet,
> > > but because the number of documents being searched is so large, I am
> > > worried about the decrease in performance of Solr when using BlockJoin
> > > Facet.
> > >
> > > I thought it may be possible to avoid dips in performance while using
> > > BlockJoin Facet by using limit and sort,
> > > however when I tried to look up more information about this, I was
> unsure
> > > of if that is possible.
> > >
> > > When using BlockJoin Facet, is it not possible to use limit and sort?
> > >
> > > 2) JSON Facet API
> > >
> > > While I was researching about the above problem, I came across JSON
> Facet
> > > API.
> > > Is it possible to do the same thing as BlockJoin Facet with JSON Facet
> > API?
> > > If it is possible, what is the most appropriate way to use each one?
> > >
> > > I was unable to find an official document with more detailed
> information
> > > about Solr's JSON Facet API,
> > > so if anyone knows of a relevant resource or reference material, I
> would
> > > be grateful if you could share it.
> > >
> > > I apologize for the rather open-ended question, but an answer or even a
> > > point in the right direction (an article
> > > or resource) would be greatly appreciated.
> > >
> > > Thanks,
> > > Kamu
> > >
> > >
> > >
> > >
> > >
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>


Re: Is it possible to use limit and sort with BlockJoin Facet?

2017-12-28 Thread Kamuela Lau
Hi All,

I am currently trying to figure out a way to apply a parent/child
relationship between two cores to searches.
I thought that BlockJoin Facet could be a good way to do this, but I would
like to know, is it impossible to use limit and sort with BlockJoin Facet?

Thanks,
Kamu


On Tue, Dec 26, 2017 at 2:31 PM, Kamuela Lau  wrote:

> Hi,
>
> I am working on applying parent/child types to search using Solr, and
> wanted to inquire about two things.
>
> 1) BlockJoin Facet
>
> Currently, there are two cores. The first core (the item core) has 16
> million documents, and the second core (the product core) has 3 million
> documents.
> The product core is the parent, and the item core is the child.
>
> To apply this parent/child relationship to searches, I thought that I
> could use BlockJoin Facet,
> but because the number of documents being searched is so large, I am
> worried about the decrease in performance of Solr when using BlockJoin
> Facet.
>
> I thought it may be possible to avoid dips in performance while using
> BlockJoin Facet by using limit and sort,
> however when I tried to look up more information about this, I was unsure
> of if that is possible.
>
> When using BlockJoin Facet, is it not possible to use limit and sort?
>
> 2) JSON Facet API
>
> While I was researching about the above problem, I came across JSON Facet
> API.
> Is it possible to do the same thing as BlockJoin Facet with JSON Facet API?
> If it is possible, what is the most appropriate way to use each one?
>
> I was unable to find an official document with more detailed information
> about Solr's JSON Facet API,
> so if anyone knows of a relevant resource or reference material, I would
> be grateful if you could share it.
>
> I apologize for the rather open-ended question, but an answer or even a
> point in the right direction (an article
> or resource) would be greatly appreciated.
>
> Thanks,
> Kamu
>
>
>
>
>


Is it possible to use limit and sort with BlockJoin Facet?

2017-12-25 Thread Kamuela Lau
Hi,

I am working on applying parent/child types to search using Solr, and
wanted to inquire about two things.

1) BlockJoin Facet

Currently, there are two cores. The first core (the item core) has 16
million documents, and the second core (the product core) has 3 million
documents.
The product core is the parent, and the item core is the child.

To apply this parent/child relationship to searches, I thought that I could
use BlockJoin Facet,
but because the number of documents being searched is so large, I am
worried about the decrease in performance of Solr when using BlockJoin
Facet.

I thought it may be possible to avoid dips in performance while using
BlockJoin Facet by using limit and sort,
however when I tried to look up more information about this, I was unsure
of if that is possible.

When using BlockJoin Facet, is it not possible to use limit and sort?

2) JSON Facet API

While I was researching about the above problem, I came across JSON Facet
API.
Is it possible to do the same thing as BlockJoin Facet with JSON Facet API?
If it is possible, what is the most appropriate way to use each one?

I was unable to find an official document with more detailed information
about Solr's JSON Facet API,
so if anyone knows of a relevant resource or reference material, I would be
grateful if you could share it.

I apologize for the rather open-ended question, but an answer or even a
point in the right direction (an article
or resource) would be greatly appreciated.

Thanks,
Kamu