[jira] [Closed] (COMPRESS-471) Zipped files names having non UTF-8 encoding are being replaced with '?' while previewing file.

2018-11-25 Thread Gaurav Mittal (JIRA)


 [ 
https://issues.apache.org/jira/browse/COMPRESS-471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gaurav Mittal closed COMPRESS-471.
--
Resolution: Workaround

Using suggestions as provided on comments.

thanks

> Zipped files names having non UTF-8 encoding are being replaced with '?' 
> while previewing file.
> ---
>
> Key: COMPRESS-471
> URL: https://issues.apache.org/jira/browse/COMPRESS-471
> Project: Commons Compress
>  Issue Type: Bug
>Affects Versions: 1.18
>Reporter: Gaurav Mittal
>Priority: Major
> Attachments: Document(▒Γ║╗)_20150226_11.zip, Incorrect.JPG, 
> correct.JPG
>
>
> | * All the strings which are not supported by UTF-8 are being replaced by 
> '?' symbol, 
> In the issue scenario the charset is 'Cp850', Since the common compress 
> library cannot identify the 'Cp850' charset and it takes the default charset 
> as 'UTF-8' therefore
>  we can see the '?' symbol
> In our code 
> ZipFile ret = new ZipFile(path);
> Moreover if we send the encoding in the function as defined below, it works 
> fine
> ZipFile ret = new ZipFile(new File(path), "Cp850",false);
> But the second scenario where we are forcibly giving the encoding as 'Cp850' 
> may cause side effects in some cases
>  --
> Below code does not seem to resolve UTF8 conflicts and could not make file 
> names into correct form -
>  
> try {
>  final Map entriesWithoutUTF8Flag =
>  populateFromCentralDirectory();
>  resolveLocalFileHeaderData(entriesWithoutUTF8Flag); 
>  success = true;
> } finally {
>  closed = !success;
>  if (!success && closeOnError) {
>  IOUtils.closeQuietly(archive);
>  }
> }|
> | |



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (COMPRESS-471) Zipped files names having non UTF-8 encoding are being replaced with '?' while previewing file.

2018-11-25 Thread Gaurav Mittal (JIRA)


[ 
https://issues.apache.org/jira/browse/COMPRESS-471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16698487#comment-16698487
 ] 

Gaurav Mittal commented on COMPRESS-471:


Thank you for your suggestion.

I am closing this case.

> Zipped files names having non UTF-8 encoding are being replaced with '?' 
> while previewing file.
> ---
>
> Key: COMPRESS-471
> URL: https://issues.apache.org/jira/browse/COMPRESS-471
> Project: Commons Compress
>  Issue Type: Bug
>Affects Versions: 1.18
>Reporter: Gaurav Mittal
>Priority: Major
> Attachments: Document(▒Γ║╗)_20150226_11.zip, Incorrect.JPG, 
> correct.JPG
>
>
> | * All the strings which are not supported by UTF-8 are being replaced by 
> '?' symbol, 
> In the issue scenario the charset is 'Cp850', Since the common compress 
> library cannot identify the 'Cp850' charset and it takes the default charset 
> as 'UTF-8' therefore
>  we can see the '?' symbol
> In our code 
> ZipFile ret = new ZipFile(path);
> Moreover if we send the encoding in the function as defined below, it works 
> fine
> ZipFile ret = new ZipFile(new File(path), "Cp850",false);
> But the second scenario where we are forcibly giving the encoding as 'Cp850' 
> may cause side effects in some cases
>  --
> Below code does not seem to resolve UTF8 conflicts and could not make file 
> names into correct form -
>  
> try {
>  final Map entriesWithoutUTF8Flag =
>  populateFromCentralDirectory();
>  resolveLocalFileHeaderData(entriesWithoutUTF8Flag); 
>  success = true;
> } finally {
>  closed = !success;
>  if (!success && closeOnError) {
>  IOUtils.closeQuietly(archive);
>  }
> }|
> | |



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (COMPRESS-472) 7Z Compress is very slow, appx. 1MB/second for mp4/mp3 files

2018-11-25 Thread Gaurav Mittal (JIRA)
Gaurav Mittal created COMPRESS-472:
--

 Summary: 7Z Compress is very slow, appx. 1MB/second for mp4/mp3 
files
 Key: COMPRESS-472
 URL: https://issues.apache.org/jira/browse/COMPRESS-472
 Project: Commons Compress
  Issue Type: Bug
Reporter: Gaurav Mittal


Hi,

 

I am using common compress library to compress files in .7z format.

Compression rate is very slow(even with LMZA2 with compress preset  set as 3 
while normal is 6) and close to 1MB/second for files which does not get 
compressed like mp4/mp3 etc.

Same files are compressing faster in zip compressor.

Could there be way to fast compressing such file into .7z format?

 

Thanks

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (COMPRESS-471) Zipped files names having non UTF-8 encoding are being replaced with '?' while previewing file.

2018-11-24 Thread Gaurav Mittal (JIRA)


[ 
https://issues.apache.org/jira/browse/COMPRESS-471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16697766#comment-16697766
 ] 

Gaurav Mittal commented on COMPRESS-471:


Hi,

 

{{I found below solution -}}
{quote}private boolean isUTF8Encoded(ZipFile zipFile) {
 boolean foundUTF8 = false;
if (zipFile != null) {
 foundUTF8 = zipFile.getEncoding().equalsIgnoreCase("UTF8");
 Enumeration list = zipFile.getEntries();
 if (list != null && list.hasMoreElements()) {
 ZipArchiveEntry entry;
 if ((entry = list.nextElement()) != null)
 foundUTF8 = entry.getGeneralPurposeBit().usesUTF8ForNames();     // using GPB
 }
 }
 return foundUTF8;
}
{quote}
if above API returns false then I can use another constructor of zip file with 
CP850 charset and get the desired file names.

Please let me know whether above approach is fine or not.

Thanks

> Zipped files names having non UTF-8 encoding are being replaced with '?' 
> while previewing file.
> ---
>
> Key: COMPRESS-471
> URL: https://issues.apache.org/jira/browse/COMPRESS-471
> Project: Commons Compress
>  Issue Type: Bug
>Affects Versions: 1.18
>Reporter: Gaurav Mittal
>Priority: Major
> Attachments: Document(▒Γ║╗)_20150226_11.zip, Incorrect.JPG, 
> correct.JPG
>
>
> | * All the strings which are not supported by UTF-8 are being replaced by 
> '?' symbol, 
> In the issue scenario the charset is 'Cp850', Since the common compress 
> library cannot identify the 'Cp850' charset and it takes the default charset 
> as 'UTF-8' therefore
>  we can see the '?' symbol
> In our code 
> ZipFile ret = new ZipFile(path);
> Moreover if we send the encoding in the function as defined below, it works 
> fine
> ZipFile ret = new ZipFile(new File(path), "Cp850",false);
> But the second scenario where we are forcibly giving the encoding as 'Cp850' 
> may cause side effects in some cases
>  --
> Below code does not seem to resolve UTF8 conflicts and could not make file 
> names into correct form -
>  
> try {
>  final Map entriesWithoutUTF8Flag =
>  populateFromCentralDirectory();
>  resolveLocalFileHeaderData(entriesWithoutUTF8Flag); 
>  success = true;
> } finally {
>  closed = !success;
>  if (!success && closeOnError) {
>  IOUtils.closeQuietly(archive);
>  }
> }|
> | |



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (COMPRESS-471) Zipped files names having non UTF-8 encoding are being replaced with '?' while previewing file.

2018-11-21 Thread Gaurav Mittal (JIRA)


[ 
https://issues.apache.org/jira/browse/COMPRESS-471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16694557#comment-16694557
 ] 

Gaurav Mittal commented on COMPRESS-471:


I tried with RAW name but then there will be lot of code changes and it will be 
difficult to manage them in zip preview and unzip process and dialog boxes etc.

 

I would like library to set some flag when there are non-UTF8 characters in 
ZipFile means -

foundUTF8 = zipFile.getEncoding().equalsIgnoreCase("UTF8"); 
it might return true or false.

or there could be some boolean value which tell whether non-utf8 characters are 
present in zip file or not.

foundUTF8 = zipFile.isUTF8Encoding(); //lib API

 

on that basis we would be able to make more robust changes.

 

We know that ZipFile constructor has all the facility to detect non UTF-8 
characters in file name but it does not give power to client code to utilize it.

 

private ZipFile(SeekableByteChannel channel, String archiveName, String 
encoding, boolean useUnicodeExtraFields, boolean closeOnError) throws 
IOException {
 this.entries = new LinkedList();

.

try {
 Map entriesWithoutUTF8Flag = 
this.populateFromCentralDirectory();

..



}

.

}

 

Could you please make changes in library or suggest some other way(other than 
raw name).

> Zipped files names having non UTF-8 encoding are being replaced with '?' 
> while previewing file.
> ---
>
> Key: COMPRESS-471
> URL: https://issues.apache.org/jira/browse/COMPRESS-471
> Project: Commons Compress
>  Issue Type: Bug
>Affects Versions: 1.18
>Reporter: Gaurav Mittal
>Priority: Major
> Attachments: Document(▒Γ║╗)_20150226_11.zip, Incorrect.JPG, 
> correct.JPG
>
>
> | * All the strings which are not supported by UTF-8 are being replaced by 
> '?' symbol, 
> In the issue scenario the charset is 'Cp850', Since the common compress 
> library cannot identify the 'Cp850' charset and it takes the default charset 
> as 'UTF-8' therefore
>  we can see the '?' symbol
> In our code 
> ZipFile ret = new ZipFile(path);
> Moreover if we send the encoding in the function as defined below, it works 
> fine
> ZipFile ret = new ZipFile(new File(path), "Cp850",false);
> But the second scenario where we are forcibly giving the encoding as 'Cp850' 
> may cause side effects in some cases
>  --
> Below code does not seem to resolve UTF8 conflicts and could not make file 
> names into correct form -
>  
> try {
>  final Map entriesWithoutUTF8Flag =
>  populateFromCentralDirectory();
>  resolveLocalFileHeaderData(entriesWithoutUTF8Flag); 
>  success = true;
> } finally {
>  closed = !success;
>  if (!success && closeOnError) {
>  IOUtils.closeQuietly(archive);
>  }
> }|
> | |



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (COMPRESS-471) Zipped files names having non UTF-8 encoding are being replaced with '?' while previewing file.

2018-11-18 Thread Gaurav Mittal (JIRA)


[ 
https://issues.apache.org/jira/browse/COMPRESS-471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16691230#comment-16691230
 ] 

Gaurav Mittal commented on COMPRESS-471:


We are concerned about cases where we do not need to apply CP850 encoding but 
still we are applying and then files names can contain undesirable characters.

 

query:

Is there any way we can know that some particular zip file contains non-UTF8 
characters in file name and if yes, we can handle at application level.

Currently I do not see any method that can tell me about non-UTF8 characters 
and hence we are not able to decide when to apply UTF8 and when to apply other 
character encoding.

Is it possible to fix it from library side?

 

Thanks

> Zipped files names having non UTF-8 encoding are being replaced with '?' 
> while previewing file.
> ---
>
> Key: COMPRESS-471
> URL: https://issues.apache.org/jira/browse/COMPRESS-471
> Project: Commons Compress
>  Issue Type: Bug
>Affects Versions: 1.18
>Reporter: Gaurav Mittal
>Priority: Major
> Attachments: Document(▒Γ║╗)_20150226_11.zip, Incorrect.JPG, 
> correct.JPG
>
>
> | * All the strings which are not supported by UTF-8 are being replaced by 
> '?' symbol, 
> In the issue scenario the charset is 'Cp850', Since the common compress 
> library cannot identify the 'Cp850' charset and it takes the default charset 
> as 'UTF-8' therefore
>  we can see the '?' symbol
> In our code 
> ZipFile ret = new ZipFile(path);
> Moreover if we send the encoding in the function as defined below, it works 
> fine
> ZipFile ret = new ZipFile(new File(path), "Cp850",false);
> But the second scenario where we are forcibly giving the encoding as 'Cp850' 
> may cause side effects in some cases
>  --
> Below code does not seem to resolve UTF8 conflicts and could not make file 
> names into correct form -
>  
> try {
>  final Map entriesWithoutUTF8Flag =
>  populateFromCentralDirectory();
>  resolveLocalFileHeaderData(entriesWithoutUTF8Flag); 
>  success = true;
> } finally {
>  closed = !success;
>  if (!success && closeOnError) {
>  IOUtils.closeQuietly(archive);
>  }
> }|
> | |



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (COMPRESS-471) Zipped files names having non UTF-8 encoding are being replaced with '?' while previewing file.

2018-11-13 Thread Gaurav Mittal (JIRA)
Gaurav Mittal created COMPRESS-471:
--

 Summary: Zipped files names having non UTF-8 encoding are being 
replaced with '?' while previewing file.
 Key: COMPRESS-471
 URL: https://issues.apache.org/jira/browse/COMPRESS-471
 Project: Commons Compress
  Issue Type: Bug
Affects Versions: 1.18
Reporter: Gaurav Mittal
 Attachments: Document(▒Γ║╗)_20150226_11.zip, Incorrect.JPG, correct.JPG

| * All the strings which are not supported by UTF-8 are being replaced by '?' 
symbol, 
In the issue scenario the charset is 'Cp850', Since the common compress library 
cannot identify the 'Cp850' charset and it takes the default charset as 'UTF-8' 
therefore
 we can see the '?' symbol

In our code 
ZipFile ret = new ZipFile(path);

Moreover if we send the encoding in the function as defined below, it works fine
ZipFile ret = new ZipFile(new File(path), "Cp850",false);

But the second scenario where we are forcibly giving the encoding as 'Cp850' 
may cause side effects in some cases


 --
Below code does not seem to resolve UTF8 conflicts and could not make file 
names into correct form -
 
try {
 final Map entriesWithoutUTF8Flag =
 populateFromCentralDirectory();
 resolveLocalFileHeaderData(entriesWithoutUTF8Flag); 
 success = true;
} finally {
 closed = !success;
 if (!success && closeOnError) {
 IOUtils.closeQuietly(archive);
 }
}|
| |



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)