Ah, good to know!
I'm actually using lower level calls, as I'm building the TokenStream by
hand from UIMA annotations and not using any analyzer, but I'll keep that
in mind for uture projects. Thanks!
On Thu, Jun 15, 2017 at 12:10 PM Erick Erickson
wrote:
> José:
>
> Do note that, while the byt
José:
Do note that, while the bytearray isn't limited, prior to LUCENE-7705
most of the tokenizers you would use limited the incoming token to 256
at most. This is not at all a _Lucene_ limitation at a low level,
rather if you're indexing data with a delimited payload (say
abc|your_payload_here) t
Hi Markus, thanks for your response!
Now I feel stupid, that is clearly a much simpler approach and it has the
added benefits that it would not require me to meddle into the scoring
process, which I'm still a bit terrified of. Thanks for the tip.
I guess the question is still valid though? i.e. h
encode.
> >
> > Thanks!
> > Markus
> >
> > -Original message-
> > > From:Erick Erickson
> > > Sent: Wednesday 14th June 2017 23:29
> > > To: java-user
> > > Subject: Re: Using POS payloads for chunking
> > >
> > &
s. Payloads are versatile!
> > >
> > > The downside of payloads is that they are limited to 8 bits. Although
> we can easily fit our reduced treebank in there, we also use single bits to
> signal for compound/subword, and stemmed/unstemmed and some others.
> >
-Original message-
> From:Erick Erickson
> Sent: Wednesday 14th June 2017 23:29
> To: java-user
> Subject: Re: Using POS payloads for chunking
>
> Markus:
>
> I don't believe that payloads are limited in size at all. LUCENE-7705
> was done in part because there
ds,
> Markus
>
> -Original message-
>> From:Erik Hatcher
>> Sent: Wednesday 14th June 2017 23:03
>> To: java-user@lucene.apache.org
>> Subject: Re: Using POS payloads for chunking
>>
>> Markus - how are you encoding payloads as bitsets and use them for scori
23:03
> To: java-user@lucene.apache.org
> Subject: Re: Using POS payloads for chunking
>
> Markus - how are you encoding payloads as bitsets and use them for scoring?
> Curious to see how folks are leveraging them.
>
> Erik
>
> > On Jun 14, 2017, at 4:45 PM, Mar
Markus - how are you encoding payloads as bitsets and use them for scoring?
Curious to see how folks are leveraging them.
Erik
> On Jun 14, 2017, at 4:45 PM, Markus Jelsma wrote:
>
> Hello,
>
> We use POS-tagging too, and encode them as payload bitsets for scoring, which
> is, as f
Hello,
We use POS-tagging too, and encode them as payload bitsets for scoring, which
is, as far as is know, the only possibility with payloads.
So, instead of encoding them as payloads, why not index your treebanks POS-tags
as tokens on the same position, like synonyms. If you do that, you can
10 matches
Mail list logo