RE: [COMPRESS] .iwa files within latest iOS iWorks files?

2016-05-11 Thread Allison, Timothy B.

>Why? I'm not asking you, Tim, I just wonder why one would >they invent their 
>own dialect of an existing format? The stream >indetifier is a 14 bytes and 
>the checksum is a few bytes per >chunk that are actually helpful.

Couldn't agree more.

-
To unsubscribe, e-mail: user-unsubscr...@commons.apache.org
For additional commands, e-mail: user-h...@commons.apache.org



RE: [COMPRESS] .iwa files within latest iOS iWorks files?

2016-05-04 Thread Allison, Timothy B.
And the internet has answers...according to 
http://fileformats.archiveteam.org/wiki/IWA

"However, the variant of Snappy that is used does not comply with the spec for 
that format, omitting the stream identifier and checksum."

I've opened:
https://issues.apache.org/jira/browse/COMPRESS-352


-Original Message-----
From: Allison, Timothy B. [mailto:talli...@mitre.org] 
Sent: Wednesday, May 4, 2016 8:46 AM
To: Commons Users List <user@commons.apache.org>
Subject: [COMPRESS] .iwa files within latest iOS iWorks files?

All,
  Over on Tika, we recently had a user post some new Apple iWorks files.  These 
are zips that now contain mostly .iwa files.  I found one mention on the 
internet that these might be snappy compressed, but I'm not having luck opening 
them.  I get only -1 with SNAPPY_RAW, and I get an exception with SNAPPY_FRAMED.
  The files are posted on our jira [0].
  Any recommendations?  Thank you.

  Best,

   Tim

[0] https://issues.apache.org/jira/browse/TIKA-1966

-
To unsubscribe, e-mail: user-unsubscr...@commons.apache.org
For additional commands, e-mail: user-h...@commons.apache.org



[COMPRESS] .iwa files within latest iOS iWorks files?

2016-05-04 Thread Allison, Timothy B.
All,
  Over on Tika, we recently had a user post some new Apple iWorks files.  These 
are zips that now contain mostly .iwa files.  I found one mention on the 
internet that these might be snappy compressed, but I'm not having luck opening 
them.  I get only -1 with SNAPPY_RAW, and I get an exception with SNAPPY_FRAMED.
  The files are posted on our jira [0].
  Any recommendations?  Thank you.

  Best,

   Tim

[0] https://issues.apache.org/jira/browse/TIKA-1966


RE: [commons-io] TeeInputStream that ignores skip/reset?

2015-12-17 Thread Allison, Timothy B.
Right, that's the use case.  In Tika, we have no control over what our 
dependencies are doing to the stream.  

The current implementation does a mark/reset for digesting then parsing... up 
to a certain limit, after which we cache to disk and then digest then parse the 
tmp file separately.  

The downside to this (TIKA-1701) is that for truncated zip/package files, the 
digester reads to the end of the stream for an embedded file and hits the zip 
exception and then the parser fails to extract the contents of as many files as 
it would have if it had just been parsing the file without the digester.

If skip/reset don't make any sense for a DigestingInputStream generally, I'll 
keep our modified TeeInputStream over in Tika land.

If there are other recommendations for handling this, let me know.

Thank you!

Best,

  Tim

-Original Message-
From: sebb [mailto:seb...@gmail.com] 
Sent: Wednesday, December 16, 2015 1:07 PM
To: Commons Users List <user@commons.apache.org>
Subject: Re: [commons-io] TeeInputStream that ignores skip/reset?

I'm not sure what the use case for this is, apart from avoiding the bug in 
DigestingInputStream.
Which can be avoided by not using skip/reset.

I'm not sure that skip/reset make any sense for a DigestingInputStream anyway.


On 16 December 2015 at 12:19, Allison, Timothy B. <talli...@mitre.org> wrote:
> All,
>   Over on Tika, we'd like a DigestingInputStream that ignores skip/reset 
> (unlike Java's v <= 1.8 [0]).  Before we reinvent the wheel, is there an 
> InputStream similar to TeeInputStream that ignores skip/reset, so that the 
> Digester would only see the stream as if it were read sequentially without 
> skip/reset?
>   If we do reinvent the wheel, should we contribute this InputStream to 
> commons-io as an alternate to TeeInputStream?
>   Or, even more generally, are there other recommendations for handling this? 
>  Thank you!
>
>  Best,
>
>  Tim
>
> [0] 
> http://mail-archives.apache.org/mod_mbox/commons-user/201508.mbox/%3CD
> M2PR09MB07135F86C7AC6981F1BB216BC78A0%40DM2PR09MB0713.namprd09.prod.ou
> tlook.com%3E

-
To unsubscribe, e-mail: user-unsubscr...@commons.apache.org
For additional commands, e-mail: user-h...@commons.apache.org



[commons-io] TeeInputStream that ignores skip/reset?

2015-12-16 Thread Allison, Timothy B.
All,
  Over on Tika, we'd like a DigestingInputStream that ignores skip/reset 
(unlike Java's v <= 1.8 [0]).  Before we reinvent the wheel, is there an 
InputStream similar to TeeInputStream that ignores skip/reset, so that the 
Digester would only see the stream as if it were read sequentially without 
skip/reset?
  If we do reinvent the wheel, should we contribute this InputStream to 
commons-io as an alternate to TeeInputStream?
  Or, even more generally, are there other recommendations for handling this?  
Thank you!

 Best,

 Tim

[0] 
http://mail-archives.apache.org/mod_mbox/commons-user/201508.mbox/%3CDM2PR09MB07135F86C7AC6981F1BB216BC78A0%40DM2PR09MB0713.namprd09.prod.outlook.com%3E


[codec] DigestInputStream that handles mark, reset and skip?

2015-07-31 Thread Allison, Timothy B.
All,

Before I reinvent the wheel...is there an alternative to Java's 
DigestInputStream that handles mark, reset and skip?  If I read this JDK bug 
[0] correctly, Java's DigestInputStream won't be fixed until Java 9.

Over on TIKA-1701, we found that pre-digesting an InputStream and then 
resetting can lead to fewer attachments being extracted from truncated 
(corrupt) package files -- the digester hits the EOF exception on the package 
component before the still-intact child documents can be extracted.

Cheers,

Tim

[0] https://bugs.openjdk.java.net/browse/JDK-6587699


math3.fraction.Fraction :: overflow exception with maxDenominator

2013-06-20 Thread Allison, Timothy B.
I'm getting an overflow exception when I try to create a fraction with a 
maxDenominator from a double that is very close to a simple fraction.  For 
example:

double d = 0.51;
Fraction f = new Fraction(d, 10);

According to https://issues.apache.org/jira/browse/MATH-181, there are two 
separate use cases: one for a user-specified epsilon (in which case 
maxDenominator is set to Integer.MAX_VALUE), and one for a user-specified 
maxDenominator (in which case, epsilon is set to 0.0f).

If I add a check for whether q2 is  maxDenominator before the potential 
overflow throw, I no longer have the problem mentioned above. The overflow 
throw, I think, is designed for the user-defined epsilon use case, not the 
maxDenominator use case.

Is this a reasonable fix?  Should I submit a patch?

double r1 = 1.0 / (r0 - a0);
long a1 = (long)FastMath.floor(r1);
p2 = (a1 * p1) + p0;
q2 = (a1 * q1) + q0;

if (q2 = maxDenominator){
   this.numerator = (int) p1;
   this.denominator = (int) q1;
   return;
}
if ((FastMath.abs(p2)  overflow) || (FastMath.abs(q2)  overflow)) 
{
throw new FractionConversionException(value, p2, q2);
}