Re: Pagination with streaming expressions

2019-05-01 Thread Erick Erickson
This sounds like an XY problem. You’re asking now to paginate, but not 
explaining the problem you want to solve with paginating.

I don’t immediately see what purpose paginating serves here. What significance 
does a page have to do with the gatherNodes? How use would the _user_ have with 
these results? Especially for two unrelated queries. IOW if for query1 you 
count something for page 13, and for query2 you also count something for page 
13 what information is the user getting in those two cases? Especially if the 
total result set for query1 is 1,000 docs but for query2 is 10,000,000 does?

But in general no, streaming is orthogonal to most use-cases for pagination and 
isn’t really supported except if you read through the returns and throw away 
the first N pages, probably pretty inefficient.

Erick

> On May 1, 2019, at 1:28 PM, Pratik Patel  wrote:
> 
> Hello Everyone,
> 
> Is there a way to paginate the results of Streaming Expression?
> 
> Let's say I have a simple gatherNodes function which has count operation at
> the end of it. I can sort by the count fine but now I would like to be able
> to select specific sub set of result based on pagination parameters. Is
> there any way to do that?
> 
> Thanks!
> Pratik



Update Solr 7.7 Reference Guide graceTime -> graceDuration

2019-05-01 Thread bban954
When setting up a Scheduled Trigger for Solr autoscaling I was running into
errors that graceTime was an undefined property. I found some discussion on
patch changes that had updated this property name to graceDuration, which
should probably be reflected in the reference guide at
https://lucene.apache.org/solr/guide/7_7/solrcloud-autoscaling-triggers.html#scheduled-trigger.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Term Freq Vector with SOLR cell?

2019-05-01 Thread Erik Hatcher
q=doc_content?Try q=id:""

Solr Cell and DIH are comparable (in that they are about getting content into 
Solr) but "unrelated" to TVRH.   TVRH is about inspecting indexed content, 
regardless of how it got in.

Erik


> On May 1, 2019, at 3:14 PM, Geoffrey Willis  
> wrote:
> 
> I am using Solr in a web app to extract text from .pdf, and docx files. I was 
> wondering if I can access the TermFreq and TermPosition vectors via the HTTP 
> interface exposed by Solr Cell. I’m posting/getting documents fine, I’ve 
> enabled the TV, TFV etc in the managed schema:
> 
>  stored="true" termPayloads="true" termPositions="true" termVectors="true”/>
> 
> And use a get request similar to :
> 
>   
> http://localhost:8983/solr/myCore/tvrh?q=doc_content=true=true=true=true=true
>  s=true=includes
> 
> When I look in the browser network tab, I see that the query went in as 
> expected with tv=true, tv.positions= true etc. But no Term Positions/Offsets 
> in the results. I’ve done similar using the Data Import Handler with java, 
> but looking for a web solution. Before I “Roll my own” Term Vector, thought 
> I’d see if it’s available from Solr Cell.



SolrPlugin update existing documents in newSearcher()

2019-05-01 Thread Maria Muslea
Hi,

I have a plugin that extends the AbstractSolrEventListener. I override the
newSearcher() method and the plan is to add some extra functionality,
namely updating existing documents by setting new values for existing
fields as well as adding new fields to the documents.

I can see that the plugin is invoked and I can get the list of documents,
but I cannot update existing fields or add new fields. I have tried various
approaches, but I cannot get it to work.

If you have any suggestions I would really appreciate it. The code that I
am currently trying is below.

Thank you,
Maria

 for (DocIterator iter = docs.iterator(); iter.hasNext();) {

int doci = iter.nextDoc();

Document document = newSearcher.doc(doci);



SolrInputDocument solrInputDocument1 = new SolrInputDocument();

AddUpdateCommand addUpdateCommand1 = new AddUpdateCommand(req);

addUpdateCommand1.clear();

solrInputDocument1.setField("id", document.get("id"));

solrInputDocument1.addField("newfield", "newvalue");

solrInputDocument1.setField("existingfield", "value");

addUpdateCommand1.solrDoc = solrInputDocument1;

getCore().getUpdateHandler().addDoc(addUpdateCommand1);


SolrQueryResponse re = new SolrQueryResponse();

SolrQueryRequest rq = new LocalSolrQueryRequest(getCore(), new
 ModifiableSolrParams());

CommitUpdateCommand commit = new CommitUpdateCommand(rq,false);

 getCore().getUpdateHandler().commit(commit);


 }


Term Freq Vector with SOLR cell?

2019-05-01 Thread Geoffrey Willis
I am using Solr in a web app to extract text from .pdf, and docx files. I was 
wondering if I can access the TermFreq and TermPosition vectors via the HTTP 
interface exposed by Solr Cell. I’m posting/getting documents fine, I’ve 
enabled the TV, TFV etc in the managed schema:

http://localhost:8983/solr/myCore/tvrh?q=doc_content=true=true=true=true=true
  s=true=includes

When I look in the browser network tab, I see that the query went in as 
expected with tv=true, tv.positions= true etc. But no Term Positions/Offsets in 
the results. I’ve done similar using the Data Import Handler with java, but 
looking for a web solution. Before I “Roll my own” Term Vector, thought I’d see 
if it’s available from Solr Cell. 

Pagination with streaming expressions

2019-05-01 Thread Pratik Patel
Hello Everyone,

Is there a way to paginate the results of Streaming Expression?

Let's say I have a simple gatherNodes function which has count operation at
the end of it. I can sort by the count fine but now I would like to be able
to select specific sub set of result based on pagination parameters. Is
there any way to do that?

Thanks!
Pratik


Re: problem indexing GPS metadata for video upload

2019-05-01 Thread Tim Allison
Related?

https://issues.apache.org/jira/plugins/servlet/mobile#issue/TIKA-2861


On Wed, May 1, 2019 at 8:09 AM Alexandre Rafalovitch 
wrote:

> What happens when you run it against a standalone Tika (recommended option
> anyway)? Do you see the relevant fields?
>
> Not every Tika field is captured, that is configured in solrconfig.xml. So
> if Tika extracts them, next step is to check the mapping.
>
> Regards,
>  Alex
>
> On Wed, May 1, 2019, 5:38 AM Where is Where,  wrote:
>
> > uploading video to solr via tika
> >
> >
> https://lucene.apache.org/solr/guide/7_7/uploading-data-with-solr-cell-using-apache-tika.html
> > The index has no video GPS metadata which is extracted and indexed for
> > images such as jpeg. I have checked both MP4 and MOV files, the files I
> > checked all have GPS Exif data embedded in the same fields as image. Any
> > idea? Thanks!
> >
>


Re: problem indexing GPS metadata for video upload

2019-05-01 Thread Alexandre Rafalovitch
What happens when you run it against a standalone Tika (recommended option
anyway)? Do you see the relevant fields?

Not every Tika field is captured, that is configured in solrconfig.xml. So
if Tika extracts them, next step is to check the mapping.

Regards,
 Alex

On Wed, May 1, 2019, 5:38 AM Where is Where,  wrote:

> uploading video to solr via tika
>
> https://lucene.apache.org/solr/guide/7_7/uploading-data-with-solr-cell-using-apache-tika.html
> The index has no video GPS metadata which is extracted and indexed for
> images such as jpeg. I have checked both MP4 and MOV files, the files I
> checked all have GPS Exif data embedded in the same fields as image. Any
> idea? Thanks!
>


problem indexing GPS metadata for video upload

2019-05-01 Thread Where is Where
uploading video to solr via tika
https://lucene.apache.org/solr/guide/7_7/uploading-data-with-solr-cell-using-apache-tika.html
The index has no video GPS metadata which is extracted and indexed for
images such as jpeg. I have checked both MP4 and MOV files, the files I
checked all have GPS Exif data embedded in the same fields as image. Any
idea? Thanks!